hell yeah
Posts by Vagrant Gautam
this wording is going to be helpful for my reviews too, thanks!
congrats!!
🥳🥳🥳🥳🥳🥳🥳
and john colazione and john pranzo
oh god
we hung out a bunch and it was lovely!
I'm playing hooky today 🙈 but yes, tomorrow or day after!
A cat naps contentedly while sitting neatly on a mosaic tiled floor with a lot of large and brightly coloured patterns. The walls behind are bright blue.
A cat yawns widely, looking almost as if it's screaming, in front of a display with rows of touristy magnets of blue windows in Chefchaouen.
Moar cats if you made it to the end of this thread.
The first case is kinda like if you don't swear because you're constantly thinking to yourself, "I don't want to swear, I can't say fuck, please god I need to shut up," while the second is like if you don't swear just because you don't know how.
I refer you to Andreas's thread on aligned probing. The coolest finding for me was: When model representations are more informative of an input prompt's toxicity, its generations are less toxic. But for DPO-detoxified models, *less* informative representations somehow also result in lower toxicity.
There's still a very long way to go here. Referential / pronominal reasoning is something we as humans are great at and we don't even break it down into steps. In contrast, even DeepSeek-distilled Llama-70B with a huge token budget is just above chance in easy settings where humans are perfect.
Prior work shows that code pre-training improves entity tracking, but chain-of-thought prompting worsens pronominal reference. When you combine the two (i.e., *train* on chains of thought about code, math, and logic; via DeepSeek distillation), it helps pronominal reasoning!
I already presented some work on reference (names, pronouns, coreference resolution, pronoun fidelity, etc.) as a rich site to evaluate biases and commonsense reasoning, and our work on disentangling model behaviour and internals through aligned probing (led by @tresiwald.bsky.social).
On Sunday, I'm presenting the course I designed on defining and measuring abstract concepts in NLP like "bias," and "interpretability," something we need as researchers to critically parse existing work, make sense of hype, and to do meaningful science. 15% of my poster is a meme, come check it out!
A smug-looking cat loafs with its paws folded in at the top of stairs that lead down into an alley painted completely blue in Chefchaouen, Morocco.
A huge and imposing marble mosque in Casablanca, with blue skies and sparse white clouds in the sky above. This is the Hassan II Mosque, one of the biggest in the world. The handful of people standing around in front of it look like ants.
Late post but I'm at #EACL2026 in Morocco where I'm petting cats, seeing sights, and presenting some work - here are the highlights.
+1 that was definitely one of the coolest talks i've been to at a conference!
And separately, another @nsaphra.bsky.social and @sarah-nlp.bsky.social banger aclanthology.org/2024.blackbo...
Another example is Attention is not Explanation, followed by Attention is not not Explanation, and then all the authors collaborated on a third paper, Learning to Faithfully Rationalize by Construction
aclanthology.org/N19-1357/
aclanthology.org/D19-1002/
aclanthology.org/2020.acl-mai...
I like historical debates in the field, e.g., aclanthology.org/2020.acl-mai... and one response julianmichael.org/blog/2020/07...
damn this is so juicy
simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity
LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)
“Whose Facts Win? LLM Source Preference under Knowledge Conflicts” Authors: Jakob Schuster, Vagrant Gautam, Katja Markert Source credibility hierarchy of Government > Newspaper > Person, Social Media induced by evaluating 13 LLMs on source and knowledge conflicts. However, repeating information can flip preferences.
Excited to share the first preprint of my PhD!
While many papers focus on what kind of information LLMs trust, @dippedrusk.com, Katja Markert, and I instead investigate whose evidence models prefer by looking at source credibility.
#NLP #Research #CL #LLMs
1/7 🧵
I passed! #PhDone
love u <3
<3 <3
Naming in academia: Fill out our survey! We're surveying scholars about naming and name change experiences in academia. This includes spelling variations, reordering, changing any part of your name, for any reason: gender transition, marriage, divorce, immigration, cultural reasons, or recognition. This surveys takes around 5-10 minutes!
@pranav-nlp.bsky.social and I are surveying researchers about naming and name changes in academia (especially computer science).
If your academic name is / has been / might someday be different from other names you've used, please tell us about it here: forms.cloud.microsoft/e/E0XXBmZdEP
The scene where she appears is the best scene in the film imo
www.youtube.com/watch?v=VfkQ...
Vagrant (me) staring into the distance wearing smokey makeup, a long-haired black wig, and a black scar on xyr face that is fake-stapled together with shiny silvery stickers. I'm also wearing a black dress that looks very goth.
Monica Bellucci is a divine vision for goths everywhere with her stapled face, tear-stained smokey makeup, dark flowing hair and black dress from Beetlejuice Beetlejuice. She looks unhappy, betrayed, and stunning.
I was Delores from Beetlejuice Beetlejuice for Halloween!