Rabiraj Banerjee (@rabirajb) Bsky

2026 is a whirlwind year for AI.

Underlying it all is the greatest scientific mystery of our age. How does a neural network think?

I talked w Oliver Whang in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled:

www.nytimes.com/2026/04/15/...

3 days ago 13 2 2 0

🚨 Big news! #ICWSM '27 is heading to Edinburgh, Scotland 🏰
More details on dates and venue coming soon ✨

❗How submissions work:
• May 15, 2026: accept (→ ICWSM '27), R&R → Sept '26
• Sept 15, 2026: accept (→ ICWSM '27), R&R → Jan '27
• Jan 15, 2027: accept (→ ICWSM '27), R&R → May '27 (→ ICWSM '28)

1 week ago 14 8 0 1

A conceptual framework for ideology beyond the left and right NLP+CSS work has operationalized ideology almost exclusively on a left/right partisan axis. This approach obscures the fact that people hold interpretations of many different complex and more specific...

I have been thinking for the last few years about ideologies and how they emerge in text.

This paper, with @davidlazer.bsky.social and Kim Williams, reflects some of those thoughts, and how I think we can improve and expand on how we operationalize ideology in discourse.

arxiv.org/abs/2603.18945

1 month ago 20 6 0 0

19th Conference of the European Chapter of the Association for Computational Linguistics - ACL Anthology

Do you need a weekend read? The proceedings of EACL 2026 and co-located workshops are now online! @eaclmeeting.bsky.social

aclanthology.org/events/eacl-...

1 month ago 5 3 0 0

🗄 history of NLP and the ACL | Are.na

I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?

I've been maintaining a small collection here: www.are.na/maria-antoni...

1 month ago 80 15 26 2

Theoretical Foundations of Conformal Prediction This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypot...

The new conformal prediction book now seems to be final after a bunch of updates: arxiv.org/abs/2411.118...

1 month ago 11 3 1 0

why do science? it won,t make the model Bigger

1 month ago 47 4 4 0

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited...

This has a very cool result on in-context learned classification tasks, where they disentangle representational quality (how well-separated concept labels are) and readout alignment (how good it is at reading out its own inner labels). Adding demo examples helps through readout, not representations!

1 month ago 36 5 1 0

Screenshot from the paper with a figure showing 15 scatterplots in a grid., evaluating LLM-as-judge on HELM. In each plot, one model is used as the judge. Each dot is another model; the y-axis is the accuracy inflation (compared to ground truth) of using the given model as the judge, and the x-axis is the model’s true accuracy. A vertical red line corresponds to the true accuracy of the judge. Each judge tends to inflate the accuracy of models that are less accurate than itself, especially models from the same provider or family

yeah! In this paper in ICML25, we found both directions of this -- in LLM as judge, using a bigger/more accurate model inflates accuracies bc of correlated errors, and using a worse model deflates them for the reason in the above paper

arxiv.org/abs/2506.07962

2 months ago 10 3 0 0

we need to formulate a new name for such people 😤😤😤

2 months ago 0 0 0 0

Why don’t neural networks learn all at once, but instead progress from simple to complex solutions? And what does “simple” even mean across different neural network architectures?

Sharing our new paper @iclr_conf led by Yedi Zhang with Peter Latham

arxiv.org/abs/2512.20607

2 months ago 155 41 7 4

God of War Ragnarok , Black Myth Wukong(very hard), Witcher 3

2 months ago 1 0 0 0

1/ 🌍 How does mixing data from hundreds of languages affect LLM training?
In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages.
🧵👇

4 months ago 9 6 1 0

So @lchoshen.bsky.social posted a thread on X about how different training runs tend to converge, and I just had to argue with him. Training variation is fascinating, and I think we've kinda cracked it!

4 months ago 18 3 2 3

we released Olmo 3! lot of exciting stuff but wanna focus on:

🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training

5 months ago 41 7 1 1

Induction through Compression
I personally loved the relationship between ICL and Komogorov Complexity that this paper proposed arxiv.org/pdf/2410.14086

5 months ago 2 0 0 0

I’m excited to share our Findings of EMNLP paper w/ @cocoscilab.bsky.social , @rtommccoy.bsky.social, and @rdhawkins.bsky.social !

Language models, unlike humans, require large amounts of data, which suggests the need for an inductive bias.
But what kind of inductive biases do we need?

5 months ago 7 5 1 1

In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one… | Subbarao K... In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one place.. (𝗙𝗶𝗿𝘀𝘁..) 𝗘𝘃𝗮𝗹...

In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one place..👇

www.linkedin.com/posts/subbar...

7 months ago 5 1 0 0

A Survey of Reinforcement Learning for Large Reasoning Models

Five sections:

- Foundational Components
- Foundational Problems
- Training Resources
- Applications
- Future Directions

7 months ago 18 3 1 0

Humans Perceive Wrong Narratives from AI Reasoning Texts A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly r...

When reading AI reasoning text (aka CoT), we (humans) form a narrative about the underlying computation process, which we take as a transparent explanation of model behavior. But what if our narratives are wrong? We measure that and find it usually is.

Now on arXiv: arxiv.org/abs/2508.16599

7 months ago 85 23 4 2

Great interview with @stevenstrogatz.com with a lot of discussion of research advising. Parts reminded me of @eegilbert.org and @informor.bsky.social's (excellent) guides to PhD mentorship, with a big focus on ideation.
Eric's: docs.google.com/document/d/1...
Mor's: s.tech.cornell.edu/phd-syllabus/

7 months ago 49 8 0 0

We try to avoid self-promoting too much, but we (with @sjgreenwood.bsky.social) built a personalized feed with posts about papers from your network. Many people say it's the closest they can get to old academic twitter, and I hope you enjoy it and share with others too!

bsky.app/profile/pape...

8 months ago 19 5 1 0

8 months ago 6 1 0 0

🤖 But wait! There's more! You can check out @shiraamitchell.bsky.social 's most recent update on the details of Calibration, posted yesterday! statmodeling.stat.columbia.edu/2025/08/12/s...

8 months ago 7 3 0 0

#acl2025 anyone get a good quote of phil resnik's last comment?

context: (some?all?) panelists & him agree the field needs more deep, careful research on smaller models to do better science. everyone is frustrated with impossibility of large-scale pretraining experiments

8 months ago 7 1 1 0

@kennyjoseph.bsky.social , Kenny check this thread out

8 months ago 2 0 0 0

aclanthology.org/2023.emnlp-m..., for Active Learning I really liked this paper, uses LLMs as annotator for knowledge distillation for small LMs

8 months ago 1 0 0 0

What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?

Looking for practical methods for settings where human annotations are costly.

A few examples in thread ↴

9 months ago 79 23 13 3

This is so mean !!!

9 months ago 0 0 0 0

Assistant, Associate or Full Professor, AI & Society The Department of AI and Society (AIS) at the University at Buffalo (UB) invites candidates to apply for multiple positions as Assistant Professor, Associate Professor, or Full Professor. The new AIS ...

UB's new Department of AI and Society is hiring faculty across ranks (Assistant, Associate, Full Professor). We’re looking for transdisciplinary scholars interested in building AI by society, for society. Start dates begin Fall 2025.

More info: www.ubjobs.buffalo.edu/postings/57734

9 months ago 11 9 0 1

Posts by Rabiraj Banerjee