Advertisement ยท 728 ร— 90

Posts by Rabiraj Banerjee

2026 is a whirlwind year for AI.

Underlying it all is the greatest scientific mystery of our age. How does a neural network think?

I talked w Oliver Whang in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled:

www.nytimes.com/2026/04/15/...

3 days ago 13 2 2 0
Post image

๐Ÿšจ Big news! #ICWSM '27 is heading to Edinburgh, Scotland ๐Ÿฐ
More details on dates and venue coming soon โœจ

โ—How submissions work:
โ€ข May 15, 2026: accept (โ†’ ICWSM '27), R&R โ†’ Sept '26
โ€ข Sept 15, 2026: accept (โ†’ ICWSM '27), R&R โ†’ Jan '27
โ€ข Jan 15, 2027: accept (โ†’ ICWSM '27), R&R โ†’ May '27 (โ†’ ICWSM '28)

1 week ago 14 8 0 1
Preview
A conceptual framework for ideology beyond the left and right NLP+CSS work has operationalized ideology almost exclusively on a left/right partisan axis. This approach obscures the fact that people hold interpretations of many different complex and more specific...

I have been thinking for the last few years about ideologies and how they emerge in text.

This paper, with @davidlazer.bsky.social and Kim Williams, reflects some of those thoughts, and how I think we can improve and expand on how we operationalize ideology in discourse.

arxiv.org/abs/2603.18945

1 month ago 20 6 0 0
19th Conference of the European Chapter of the Association for Computational Linguistics - ACL Anthology

Do you need a weekend read? The proceedings of EACL 2026 and co-located workshops are now online! @eaclmeeting.bsky.social

aclanthology.org/events/eacl-...

1 month ago 5 3 0 0
Preview
๐Ÿ—„ history of NLP and the ACL | Are.na

I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?

I've been maintaining a small collection here: www.are.na/maria-antoni...

1 month ago 80 15 26 2
Preview
Theoretical Foundations of Conformal Prediction This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypot...

The new conformal prediction book now seems to be final after a bunch of updates: arxiv.org/abs/2411.118...

1 month ago 11 3 1 0

why do science? it won,t make the model Bigger

1 month ago 47 4 4 0
Preview
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited...

This has a very cool result on in-context learned classification tasks, where they disentangle representational quality (how well-separated concept labels are) and readout alignment (how good it is at reading out its own inner labels). Adding demo examples helps through readout, not representations!

1 month ago 36 5 1 0
Screenshot from the paper with a figure showing 15 scatterplots in a grid., evaluating LLM-as-judge on HELM. In each plot, one model is used as the judge. Each dot is another model; the y-axis is the
accuracy inflation (compared to ground truth) of using the given model as the judge, and the x-axis is the modelโ€™s true accuracy. A
vertical red line corresponds to the true accuracy of the judge. Each judge tends to inflate the accuracy of models that are less accurate
than itself, especially models from the same provider or family

Screenshot from the paper with a figure showing 15 scatterplots in a grid., evaluating LLM-as-judge on HELM. In each plot, one model is used as the judge. Each dot is another model; the y-axis is the accuracy inflation (compared to ground truth) of using the given model as the judge, and the x-axis is the modelโ€™s true accuracy. A vertical red line corresponds to the true accuracy of the judge. Each judge tends to inflate the accuracy of models that are less accurate than itself, especially models from the same provider or family

yeah! In this paper in ICML25, we found both directions of this -- in LLM as judge, using a bigger/more accurate model inflates accuracies bc of correlated errors, and using a worse model deflates them for the reason in the above paper

arxiv.org/abs/2506.07962

2 months ago 10 3 0 0

we need to formulate a new name for such people ๐Ÿ˜ค๐Ÿ˜ค๐Ÿ˜ค

2 months ago 0 0 0 0
Advertisement
Video

Why donโ€™t neural networks learn all at once, but instead progress from simple to complex solutions? And what does โ€œsimpleโ€ even mean across different neural network architectures?

Sharing our new paper @iclr_conf led by Yedi Zhang with Peter Latham

arxiv.org/abs/2512.20607

2 months ago 155 41 7 4

God of War Ragnarok , Black Myth Wukong(very hard), Witcher 3

2 months ago 1 0 0 0

1/ ๐ŸŒ How does mixing data from hundreds of languages affect LLM training?
In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages.
๐Ÿงต๐Ÿ‘‡

4 months ago 9 6 1 0

So @lchoshen.bsky.social posted a thread on X about how different training runs tend to converge, and I just had to argue with him. Training variation is fascinating, and I think we've kinda cracked it!

4 months ago 18 3 2 3
Post image Post image

we released Olmo 3! lot of exciting stuff but wanna focus on:

๐ŸŸOlmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
๐Ÿ Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
๐Ÿก12 training datasets corresp to different staged training

5 months ago 41 7 1 1

Induction through Compression
I personally loved the relationship between ICL and Komogorov Complexity that this paper proposed arxiv.org/pdf/2410.14086

5 months ago 2 0 0 0
Post image

Iโ€™m excited to share our Findings of EMNLP paper w/ @cocoscilab.bsky.social , @rtommccoy.bsky.social, and @rdhawkins.bsky.social !

Language models, unlike humans, require large amounts of data, which suggests the need for an inductive bias.
But what kind of inductive biases do we need?

5 months ago 7 5 1 1
Preview
In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in oneโ€ฆ | Subbarao K... In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one place.. (๐—™๐—ถ๐—ฟ๐˜€๐˜..) ๐—˜๐˜ƒ๐—ฎ๐—น...

In the year since LRMs ("reasoning models") hit the scene, we have been trying to understand, analyze and demystify them.. Here are our efforts to date--conveniently all in one place..๐Ÿ‘‡

www.linkedin.com/posts/subbar...

7 months ago 5 1 0 0
Post image

A Survey of Reinforcement Learning for Large Reasoning Models

Five sections:

- Foundational Components
- Foundational Problems
- Training Resources
- Applications
- Future Directions

7 months ago 18 3 1 0
Preview
Humans Perceive Wrong Narratives from AI Reasoning Texts A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly r...

When reading AI reasoning text (aka CoT), we (humans) form a narrative about the underlying computation process, which we take as a transparent explanation of model behavior. But what if our narratives are wrong? We measure that and find it usually is.

Now on arXiv: arxiv.org/abs/2508.16599

7 months ago 85 23 4 2
Advertisement

Great interview with @stevenstrogatz.com with a lot of discussion of research advising. Parts reminded me of @eegilbert.org and @informor.bsky.social's (excellent) guides to PhD mentorship, with a big focus on ideation.
Eric's: docs.google.com/document/d/1...
Mor's: s.tech.cornell.edu/phd-syllabus/

7 months ago 49 8 0 0

We try to avoid self-promoting too much, but we (with @sjgreenwood.bsky.social) built a personalized feed with posts about papers from your network. Many people say it's the closest they can get to old academic twitter, and I hope you enjoy it and share with others too!

bsky.app/profile/pape...

8 months ago 19 5 1 0
8 months ago 6 1 0 0

๐Ÿค– But wait! There's more! You can check out @shiraamitchell.bsky.social 's most recent update on the details of Calibration, posted yesterday! statmodeling.stat.columbia.edu/2025/08/12/s...

8 months ago 7 3 0 0

#acl2025 anyone get a good quote of phil resnik's last comment?

context: (some?all?) panelists & him agree the field needs more deep, careful research on smaller models to do better science. everyone is frustrated with impossibility of large-scale pretraining experiments

8 months ago 7 1 1 0

@kennyjoseph.bsky.social , Kenny check this thread out

8 months ago 2 0 0 0

aclanthology.org/2023.emnlp-m..., for Active Learning I really liked this paper, uses LLMs as annotator for knowledge distillation for small LMs

8 months ago 1 0 0 0

What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?

Looking for practical methods for settings where human annotations are costly.

A few examples in thread โ†ด

9 months ago 79 23 13 3
Advertisement

This is so mean !!!

9 months ago 0 0 0 0
Preview
Assistant, Associate or Full Professor, AI & Society The Department of AI and Society (AIS) at the University at Buffalo (UB) invites candidates to apply for multiple positions as Assistant Professor, Associate Professor, or Full Professor. The new AIS ...

UB's new Department of AI and Society is hiring faculty across ranks (Assistant, Associate, Full Professor). Weโ€™re looking for transdisciplinary scholars interested in building AI by society, for society. Start dates begin Fall 2025.

More info: www.ubjobs.buffalo.edu/postings/57734

9 months ago 11 9 0 1