2026 is a whirlwind year for AI.
Underlying it all is the greatest scientific mystery of our age. How does a neural network think?
I talked w Oliver Whang in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled:
www.nytimes.com/2026/04/15/...
Posts by Rabiraj Banerjee
๐จ Big news! #ICWSM '27 is heading to Edinburgh, Scotland ๐ฐ
More details on dates and venue coming soon โจ
โHow submissions work:
โข May 15, 2026: accept (โ ICWSM '27), R&R โ Sept '26
โข Sept 15, 2026: accept (โ ICWSM '27), R&R โ Jan '27
โข Jan 15, 2027: accept (โ ICWSM '27), R&R โ May '27 (โ ICWSM '28)
I have been thinking for the last few years about ideologies and how they emerge in text.
This paper, with @davidlazer.bsky.social and Kim Williams, reflects some of those thoughts, and how I think we can improve and expand on how we operationalize ideology in discourse.
arxiv.org/abs/2603.18945
Do you need a weekend read? The proceedings of EACL 2026 and co-located workshops are now online! @eaclmeeting.bsky.social
aclanthology.org/events/eacl-...
I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?
I've been maintaining a small collection here: www.are.na/maria-antoni...
The new conformal prediction book now seems to be final after a bunch of updates: arxiv.org/abs/2411.118...
why do science? it won,t make the model Bigger
This has a very cool result on in-context learned classification tasks, where they disentangle representational quality (how well-separated concept labels are) and readout alignment (how good it is at reading out its own inner labels). Adding demo examples helps through readout, not representations!
Screenshot from the paper with a figure showing 15 scatterplots in a grid., evaluating LLM-as-judge on HELM. In each plot, one model is used as the judge. Each dot is another model; the y-axis is the accuracy inflation (compared to ground truth) of using the given model as the judge, and the x-axis is the modelโs true accuracy. A vertical red line corresponds to the true accuracy of the judge. Each judge tends to inflate the accuracy of models that are less accurate than itself, especially models from the same provider or family
yeah! In this paper in ICML25, we found both directions of this -- in LLM as judge, using a bigger/more accurate model inflates accuracies bc of correlated errors, and using a worse model deflates them for the reason in the above paper
arxiv.org/abs/2506.07962
we need to formulate a new name for such people ๐ค๐ค๐ค
Why donโt neural networks learn all at once, but instead progress from simple to complex solutions? And what does โsimpleโ even mean across different neural network architectures?
Sharing our new paper @iclr_conf led by Yedi Zhang with Peter Latham
arxiv.org/abs/2512.20607
God of War Ragnarok , Black Myth Wukong(very hard), Witcher 3
1/ ๐ How does mixing data from hundreds of languages affect LLM training?
In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages.
๐งต๐
So @lchoshen.bsky.social posted a thread on X about how different training runs tend to converge, and I just had to argue with him. Training variation is fascinating, and I think we've kinda cracked it!
we released Olmo 3! lot of exciting stuff but wanna focus on:
๐Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
๐ Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
๐ก12 training datasets corresp to different staged training
Induction through Compression
I personally loved the relationship between ICL and Komogorov Complexity that this paper proposed arxiv.org/pdf/2410.14086
Iโm excited to share our Findings of EMNLP paper w/ @cocoscilab.bsky.social , @rtommccoy.bsky.social, and @rdhawkins.bsky.social !
Language models, unlike humans, require large amounts of data, which suggests the need for an inductive bias.
But what kind of inductive biases do we need?
In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one place..๐
www.linkedin.com/posts/subbar...
A Survey of Reinforcement Learning for Large Reasoning Models
Five sections:
- Foundational Components
- Foundational Problems
- Training Resources
- Applications
- Future Directions
When reading AI reasoning text (aka CoT), we (humans) form a narrative about the underlying computation process, which we take as a transparent explanation of model behavior. But what if our narratives are wrong? We measure that and find it usually is.
Now on arXiv: arxiv.org/abs/2508.16599
Great interview with @stevenstrogatz.com with a lot of discussion of research advising. Parts reminded me of @eegilbert.org and @informor.bsky.social's (excellent) guides to PhD mentorship, with a big focus on ideation.
Eric's: docs.google.com/document/d/1...
Mor's: s.tech.cornell.edu/phd-syllabus/
We try to avoid self-promoting too much, but we (with @sjgreenwood.bsky.social) built a personalized feed with posts about papers from your network. Many people say it's the closest they can get to old academic twitter, and I hope you enjoy it and share with others too!
bsky.app/profile/pape...
๐ค But wait! There's more! You can check out @shiraamitchell.bsky.social 's most recent update on the details of Calibration, posted yesterday! statmodeling.stat.columbia.edu/2025/08/12/s...
#acl2025 anyone get a good quote of phil resnik's last comment?
context: (some?all?) panelists & him agree the field needs more deep, careful research on smaller models to do better science. everyone is frustrated with impossibility of large-scale pretraining experiments
@kennyjoseph.bsky.social , Kenny check this thread out
aclanthology.org/2023.emnlp-m..., for Active Learning I really liked this paper, uses LLMs as annotator for knowledge distillation for small LMs
What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?
Looking for practical methods for settings where human annotations are costly.
A few examples in thread โด
This is so mean !!!
UB's new Department of AI and Society is hiring faculty across ranks (Assistant, Associate, Full Professor). Weโre looking for transdisciplinary scholars interested in building AI by society, for society. Start dates begin Fall 2025.
More info: www.ubjobs.buffalo.edu/postings/57734