Marcel Bollmann (@marcel.bollmann.me) Bsky

Thanks to AI, Google has become perhaps the largest source of misinformation in the world.

This one is just completely made up!

2 weeks ago 1608 205 31 14

Should I start listing python 3.12 as a co-author? why not?

1 month ago 2 1 0 0

This is a reminder that the meta-reviews are due TODAY 4 March AoE. Please remember that if you don't sumit your meta-reviews on time you might be considered highly irresponsible, which means your co-authored papers may be desk rejected and you may become ineligible from committing to *CL conferences or (re-)submitting any work on the the next ARR cycle.

I don't understand why the ACL/ARR organizers think this is an appropriate way to communicate with area chairs... The peer review system is collapsing, and the whole system of science as we know it requires a rethink. But let's be kind to eachother in the process and not forget what we are here for.

1 month ago 16 5 1 0

Claude: For each example I can do a web search and then make a LLM call with the results...

Me: Why an LLM call? Can't you just figure it out yourself?

Claude: You're right, I am the LLM!

2 months ago 43 7 2 2

What a ridiculous timeline we are living in

2 months ago 1 0 0 0

🙏 I invited you through ARR!

2 months ago 1 0 0 0

(Technically due on Saturday, but who wants to work then? 😉)

2 months ago 1 0 1 0

🚨 Emergency reviewer needed for ARR Resources and Evaluation track! Please ping me if you could review one paper by Friday. Topic is AI hallucinations, broadly speaking.

2 months ago 1 3 1 0

God, something is wrong in the way ARR is handling stuff, I keep receiving message from pp almost begging to be removed from the author list bc they can't keep up with the reviewing load, especially on a paper outside their area of expertise, and they don't want their student to be desk rejected

2 months ago 7 1 1 0

Good old times. @raghavian.bsky.social

2 months ago 0 0 1 0

When AIs Talk To Themselves Drop me a 💡 on the linkedin post if this is interesting. OpenClaw/Clawdbot/Moltbot have sprung into view this week. Now they are having large-scale conversations and collaborations with each …

I strongly recommend this blog post by Ben Vigoda: www.benvigoda.com/2026/02/01/w...

2 months ago 19 7 1 3

Today, the ACL Anthology switched to a new system for how author pages work. From now on, ORCID iDs will be the main mechanism for matching papers to the correct author. 🧵⤵️

2 months ago 9 5 1 0

How have we as a society still not internalized that critical data should never live in just a single place?!

2 months ago 0 0 0 0

Trying to learn to be better at that.

4 months ago 2 1 0 0

I’ve spent the last two days looking at this message at least 30 times, I’m getting ready to sue for psychological distress at this point.

4 months ago 3 0 0 0

This is terrifying.

"[AI agents] can... infer a researcher's latent hypotheses and produce data that artificially confirms them."

...

"We can no longer trust that survey responses are coming from real people" -@seanjwestwood.bsky.social

4 months ago 312 121 7 17

Wordle 1 609 2/6*

⬜🟨⬜⬜⬜
🟩🟩🟩🟩🟩

My mind is clearly dominated by weird words.

5 months ago 0 0 0 0

📢 Open Positions at the Uppsala NLP Group! 📢

Postdoc opportunity — also open to recent or soon-to-be PhD graduates (within 1–2 months).
uu.varbi.com/en/what:job/...

5 months ago 5 6 0 1

Every time.

6 months ago 448 30 6 0

I often long for a place to just post whimsical personal updates for friends, but that kind of place doesn’t exist anymore. In my personal bubble, social media has long become too fragmented and/or abandoned for that purpose.

6 months ago 2 0 2 0

Your dataset looks very cool, but I don't understand why you say “no Arabizi-specific metric or resource exists for our dialect selection”? When you contacted me, it seemed to me that you were aware of my work on Arabizi (e.g., [1,2], not to mention the cross-lingual work with Maltese [2] or character-based language models for Arabizi [4]). One of the crucial points of this work was also to propose translations into French from Algerian Arabizi, which could have helped you use a ground truth for your translation models. I'll be honest with you, I find it extremely discouraging to see that pioneering work in the processing of a language with such limited resources as Algerian Arabic dialect is not cited, even though it has been published in the major conference in the field and the data is freely available (unlike the vast majority of dialectal resources for Arabic). If even colleagues working on the same language don't find it necessary to cite us, what's the point of investing so much time and money in this type of work? In short, I hope your work doesn't encounter the same pitfalls. [1] https://www.aclweb.org/anthology/2020.acl-main.107.pdf [2] https://arxiv.org/abs/2306.14866 [3] https://arxiv.org/abs/2005.00318 [4] https://arxiv.org/abs/2110.13658 (deepL translated, from French)

Just found out that yet another paper on North African Arabizi didn't find our work worth citing. they even wrote "No Arabizi-specific metric or resource exists for our dialect selection ». We were the first to release an annotated dataset for this dialect, published at acl and shit. Discouraging.

6 months ago 18 3 2 0

📄 New article published:

“Controlling Language and Style of Multi-lingual Generative Language Models with Control Vectors” by Julius Leino & Jussi Karlgren

nejlt.ep.liu.se/article/view...

6 months ago 0 1 0 0

I'm conducting research on how ACL's peer reviewing policies impact NLP research quality, career trajectories, and inclusivity within our community. Your insights—whether you're a seasoned reviewer, early-career researcher, or anywhere in between—are invaluable. The survey takes 7-10 minutes and covers topics like review quality, reviewer assignment, and accessibility barriers. All responses are confidential and will help inform evidence-based improvements to our peer review processes.

I'm conducting research on how ACL's peer review policies impact NLP research quality, career trajectories, and inclusivity within our community. I am running a survey, which would take around 7-10 mins to complete: forms.cloud.microsoft/e/j2jr9nH3X0

I would really appreciate insights from y'all!

6 months ago 6 6 1 0

remove the label

6 months ago 5 1 1 0

📢Life update📢

🥳I'm excited to share that I've started as a postdoc at Uppsala University NLP @uppsalanlp.bsky.social, working with Joakim Nivre on topics related to constructions and multilinguality!

🙏Many thanks to the Walter Benjamin Programme of the DFG for making this possible.

7 months ago 29 2 3 1

I need a gym without any people at all, that would motivate me

7 months ago 2 0 1 0

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

7 months ago 303 106 6 23

Never ask a man his age, a woman her salary, or GPT-5 whether a seahorse emoji exists

7 months ago 2095 423 95 79

OpenAI is discovering what every social media company has also discovered: content moderation is hard and AI content moderation is also hard.

7 months ago 73 9 3 5

Why does every social media feed eventually end up looking like:

[outrageous thing happening in the US]
[extremely polarizing AI take]
[random semi-funny meme]
[shocking thing happened to person I don't know]
[yet another reason climate change is worse than we thought]

It's so emotionally tiring.

7 months ago 0 0 0 0

Posts by Marcel Bollmann