Anna Wegmann (@annawegmann) Bsky

A dream come true. A conference in a train museum. What will the breaks be like?

Choo Choo! 🚂🚂

1 hour ago 2 0 0 0

Would you realize if the book you were reading was AI? What if it was humanized to remove AI-speak?

We find that even without using stylistic cues (e.g., word choice or sentence structure) narrative choices alone give AI fiction away!

2 weeks ago 199 63 8 6

Patients ask LLMs medical questions — but how they phrase it matters more than it should.

Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6]

Full Paper: arxiv.org/abs/2604.05051

1 week ago 25 6 2 4

2026 Call for Papers Workshop on Insights from Negative Results in NLP

📢 The workshop on Insights from negative results will be back at EMNLP'26!

Your most-insightful failures can be submitted in 4 pages by June 25. It's also possible to commit short papers reviewed through ARR.

insights-workshop.github.io/2026/cfp

1 week ago 33 11 0 0

Yay. Finally a conference next to my house!

Hope to see many of the #NLProc #NLP community there 🤗

2 weeks ago 6 1 0 0

Survey-style tests developed for humans may not predict how LLMs actually behave.

Our #EACL2026 paper shows they can even be misleading when measuring racism and sexism!

Check out the paper 👇🏼

1 month ago 2 1 0 0

🥁🥁🥁 Newly out from us today in Science Advances: “Biased AI Writing Assistants Shift Users’ Attitudes on Societal Issues”.

Large Language Models are providing users with autocomplete writing suggestions on many platforms. Could these suggestions shift users’ own attitudes? (spoiler: YES) (1/7)

1 month ago 188 104 4 19

Can LLMs figure out who you are from your anonymous posts?

From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.

New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

2 months ago 125 44 8 13

our paper on data mixing for LMs is out!

while building Olmo 3, we saw gaps between data mixing literature and real practice

🐠choosing proxy size, # runs, sampling, regression, constraints..
🐟data shifts during LM dev: can we reuse past experiments?

Olmix tackles them all!

2 months ago 29 4 1 0

aclanthology.org/2025.finding...

5 months ago 0 0 0 0

If you're attending #EMNLP2025, we'll be presenting virtually in Gather Session 1 on Nov 5 at 4pm PT. Come say hello!

w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social

Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com

5 months ago 8 1 1 0

What if a single model could recognize an author's writing style no matter what language they wrote in? 🌍✍️ Our new #EMNLP2025 paper explores multilingual authorship representation, showing how training across 36 languages can sharpen stylistic signals and reduce topic bias.
👇🧵

5 months ago 18 2 1 0

We Need to Measure Data Diversity in NLP — Better and Broader Dong Nguyen, Esther Ploeger. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP — Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...

5 months ago 14 4 2 0

aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...

5 months ago 0 0 1 0

Lot's of exciting work on linguistic style this year at #EMNLP2025 #EMNLP! Including work on machine-text detection, authorship representation and more

🧵 with anthology links below
📣 with an open call to everyone to add style work that's missing

5 months ago 9 1 1 0

I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!

6 months ago 59 15 5 2

I successfully defended my PhD in Dutch fashion and required a PhD certificate in Latin. Thank you to the amazing people that got me here, a.o. @dongng.bsky.social and the ones I blur here.

6 months ago 34 2 1 1

Come join next Wednesday if you want to rant about society's love-hate relationship with LLMs!

6 months ago 13 7 0 0

one of the other entrances was closed off yesterday, increasing my commute from front door to office by another 10 minutes

8 months ago 0 0 0 0

Is this the Dutch budget cuts or does utrecht uni really not want me to come to the office? My highlight is the door that has been broken for weeks, with the only change being a laminated piece of paper saying I should enter uni maze through two other buildings.

8 months ago 0 0 1 0

Tussen Mönchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht: een monitor waarop staat „de bus hält“.

Tussen Mönchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht

8 months ago 18 3 0 0

work with and by @yupeidu.bsky.social‬

8 months ago 1 0 0 0

Tokenization is Sensitive to Language Variation Variation in language is ubiquitous and often systematically linked to regional, social, and contextual factors. Tokenizers split texts into smaller units and might behave differently for less common ...

Tokenization is Sensitive to Language Variation arxiv.org/abs/2502.15343

8 months ago 0 0 0 0

Disentangling the Roles of Representation and Selection in Data Pruning. arxiv.org/abs/2507.03648

On Support Samples of Next Word Prediction. arxiv.org/abs/2506.04047

8 months ago 1 0 2 0

VAQUUM: Are Vague Quantifiers Grounded in Visual Data? arxiv.org/pdf/2502.11874

8 months ago 0 0 1 0

Utrecht is back from #ACL2025! We had a blast.

I should have posted this before but here are some papers from people in our group that were presented at ACL.

8 months ago 3 0 1 0

I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.

8 months ago 11 4 2 0

Since people at #ACL2025 are very interested in tokenization, a reminder to join the discussion on discord set up by @mcognetta.bsky.social

8 months ago 9 2 0 0

Anyone tried the kiss the cook lunch place at #ACL2025?

8 months ago 0 0 0 0

I think accepted

8 months ago 1 0 0 0

Posts by Anna Wegmann