How can we use models of cognition to help LLMs interpret figurative language (irony, hyperbole) in a more human-like manner? Come to our #ACL2025NLP poster on Wednesday at 11AM (exhibit hall - exact location TBA) to find out! @mcgill-nlp.bsky.social @mila-quebec.bsky.social @aclmeeting.bsky.social
Posts by Ziling Cheng
What do systematic hallucinations in LLMs tell us about their generalization abilities?
Come to our poster at #ACL2025 on July 29th at 4 PM in Level 0, Halls X4/X5. Would love to chat about interpretability, hallucinations, and reasoning :)
@mcgill-nlp.bsky.social @mila-quebec.bsky.social
A blizzard is raging through Montreal when your friend says โLooks like Florida out there!โ Humans easily interpret irony, while LLMs struggle with it. We propose a ๐ณ๐ฉ๐ฆ๐ต๐ฐ๐ณ๐ช๐ค๐ข๐ญ-๐ด๐ต๐ณ๐ข๐ต๐ฆ๐จ๐บ-๐ข๐ธ๐ข๐ณ๐ฆ probabilistic framework as a solution.
Paper: arxiv.org/abs/2506.09301 to appear @ #ACL2025 (Main)
๐ Huge thanks to my collaborators @mengcao.bsky.social, Marc-Antoine Rondeau, and my advisor Jackie Cheung for their invaluable guidance and support throughout this work, and to friends at @mila-quebec.bsky.social and @mcgill-nlp.bsky.social ๐ 7/n
๐ง TL;DR: These irrelevant context hallucinations show that LLMs go beyond mere parroting ๐ฆ โ they do generalize, based on contextual cues and abstract classes. But not reliably. They're more like chameleons ๐ฆ โ blending with the context, even when they shouldnโt. 6/n
๐ Whatโs going on inside?
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like โlanguageโ) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n
Sometimes this yields the right answer for the wrong reasoning (โPortugueseโ from โBrazilโ), other times, it produces confident errors (โJapaneseโ from โHondaโ). 4/n
Turns out, we can. They follow a systematic failure mode we call class-based (mis)generalization: the model abstracts the class from the query (e.g., languages) and generalizes based on features from the irrelevant context (e.g., Honda โ Japan). 3/n
These examples show answers โ even to the same query โ can shift under different irrelevant contexts. Can we predict these shifts? 2/n
Do LLMs hallucinate randomly? Not quite.
Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode โ revealing how LLMs generalize using abstract classes + context cues, albeit unreliably.
๐ Paper: arxiv.org/abs/2505.22630 1/n