Itay Itzhak @ COLM 🍁 (@itay-itzhak) Bsky

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

5 months ago 22 10 1 5

Had a blast at CoLM! It really was as good as everyone says, congrats to the organizers 🎉
This week I’ll be in New York giving talks at NYU, Yale, and Cornell Tech.
If you’re around and want to chat about LLM behavior, safety, interpretability, or just say hi - DM me!

6 months ago 1 0 0 0

Thrilled to be part of this work led by
@adisimhi.bsky.social !

ManagerBench reveals a critical problem:
✅ LLMs can recognize harm
❌ But often choose it anyway to meet goals
🤖 Or overcorrect and become ineffective
We need better balance!

A must-read for safety folks!

6 months ago 4 0 0 0

Traveling to #COLM2025 this week, and here's some work from our group and collaborators:
Cognitive biases, hidden knowledge, CoT faithfulness, model editing, and LM4Science
See the thread for details and reach out if you'd like to discuss more!

6 months ago 6 1 1 0

At #ACL2025 and not sure what to do next? GEM 💎² is the place to be for awesome talks on the future of LLM evaluation. Come hear @GabiStanovsky, @EliyaHabba, @LChoshen and others rethink what it means to actually evaluate LLMs beyond accuracy and vibes. Thursday @ Hall C!

8 months ago 0 0 0 0

In Vienna for #ACL2025, and already had my first (vegan) Austrian sausage!

Now hungry for discussing:
– LLMs behavior
– Interpretability
– Biases & Hallucinations
– Why eval is so hard (but so fun)
Come say hi if that’s your vibe too!

8 months ago 3 1 0 0

@boknilev.bsky.social @gabistanovsky.bsky.social

9 months ago 1 0 0 0

Planted in Pretraining, Swayed by Finetuning: A Case Study on the... Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across...

Huge thanks to my co-authors
@boknilev @GabiStanovsky!
Preprint: arxiv.org/abs/2507.07186
Webpage: itay1itzhak.github.io/planted-in-...
We’d love your thoughts, critiques, and ideas 📬
Let’s talk about building more interpretable and trustworthy LLMs!
#NLProc #Bias #CognitiveAI

9 months ago 0 0 1 0

🧠 Takeaway:
Cognitive biases are not introduced during instruction tuning.
They’re planted in pretraining and only surfaced by finetuning.
If we want fairer models, we need to look deeper into the pretraining pipeline.

9 months ago 2 0 1 0

🔄 Step 2: Cross-tuning.
We swap instruction datasets between models with different pretraining.
Result: Biases follow the pretrained model!

PCA clearly shows models group by pretraining base, not by instruction.
The bias “signature” stays intact, no matter the finetuning!

9 months ago 0 0 1 0

🎲 Step 1: Training randomness.
We finetune the same model 3× with different seeds.
Result: Some variation in bias scores, but behavior patterns stay stable compared to MMLU variance.
✅ Aggregating across seeds reveals consistent trends.

9 months ago 3 0 1 0

🧪 We introduce a two-step causal framework to disentangle the effects of:
- Pretraining
- Instruction tuning
- Training randomness

- 🍁 Bottom line - pretraining is the origin of bias. Finetuning? Just the messenger
#CausalInference #TrustworthyAI #NLP

9 months ago 1 0 1 0

🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc

9 months ago 5 1 1 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵

9 months ago 84 31 2 5

Are you recovering from your @colmweb.org abstract submission? GEM has a non-archival track that allows you to submit a two-page abstract in parallel?

Our workshop deadline is soon, please consider submitting your evaluation paper!

You can find our call for papers at gem-benchmark.com/workshop

1 year ago 1 1 0 1

New paper alert!

Curious how small prompt tweaks impact LLM accuracy but don’t want to run endless inferences? We got you. Meet DOVE - a dataset built to uncover these sensitivities.

Use DOVE for your analysis or contribute samples -we're growing and welcome you aboard!

1 year ago 4 1 0 0

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

1 year ago 26 8 1 1

Super interesting! Have you tested how LAP handles more diverse paraphrasing? For example, do you think it would also work for code functions with similar roles?

1 year ago 1 0 1 0

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. However, despite mu...

🚨🚨 New preprint 🚨🚨

Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?

We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.

arxiv.org/abs/2502.14829

1 year ago 48 13 2 3

We usually blame hallucinations on uncertainty or missing knowledge. But what if I told you that LLMs hallucinate even when they *know* the correct answer - and they do it with *high certainty* 🤯?
Check out our new paper that challenges assumptions on AI trustworthiness! 🧵👇

1 year ago 2 0 0 0

GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.

Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop

1 year ago 9 5 0 1

Why not try the straightforward approach: label high-quality texts and train an LM to classify them? Of course this should be done separately for different types of texts - a great scientific paper ≠ a great novel.
(Similar to how Llama 3 pretraining used quality scores from Llama 2 and Roberta)

1 year ago 1 0 0 0

Posts by Itay Itzhak @ COLM 🍁