How can we better understand how models make predictions and which components of a training dataset are shaping their behaviors? In April we introduced OLMoTrace, a feature that lets you trace the outputs of language models back to their full training data in real time. ๐งต
Posts by Jiacheng Liu
As weโve been working towards training a new version of OLMo, we wanted to improve our methods for measuring the Critical Batch Size (CBS) of a training run, to unlock greater efficiency. but we found gaps between the methods in the literature and our practical needs for training OLMo. ๐งต
Congratulations to #UWAllen Ph.D. grads Ashish Sharma and @sewonm.bsky.socialโฌ, โช2024 @acm.org โฌDoctoral Dissertation Award honorees! Sharma won for #AI tools for mental health; Min received honorable mention for efficient, flexible language models. #ThisIsUW news.cs.washington.edu/2025/06/04/a...
Iโm thrilled to share RewardBench 2 ๐โ We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!
๐ขWeโre taking your questions now on Reddit for tomorrowโs AMA!
Ask us anything about OLMo, our family of fully-open language models. Our researchers will be on hand to answer them Thursday, May 8 at 8am PST.
The story of OLMo, our Open Language Model, goes back to February 2023 when a group of researchers gathered at Ai2 and started planning. What if we made a language model with state-of-the-art performance, but we did it completely in the open? ๐งต
A bar graph comparing average performance (10 Tasks) across OLMo 2 1B, SmolLM2 1.7B, Gemma 3 1B, Llama 3.2 1B, and Qwen 2.5 1.5B. The highest performance is 42.7, achieved by OLMo 2 1B.
We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.
Ask Us Anything about our Open Language Model, OLMo
Have questions? Weโre an open book!
Weโre excited to host an AMA to answer your Qs about OLMo, our family of open language models.
๐๏ธ When: May 8, 8-10 am PT
๐ Where: r/huggingface
๐ง Why: Gain insights from our expert researchers
Chat soon!
"With OLMoTrace, weโre actually bringing accessibility to openness, enabling everybody to start looking into the inner workings of the relationships between the input and output of these models." - Ali Farhadi, Ai2 CEO
Last week we released OLMoTrace as part of #GoogleCloudNext
Ai2 launched a new tool where your responses from OLMo get mapped back to related training data. We're using this actively to improve our post-training data and hope many others will use it for understanding and transparency around leading language models!
Some musings:
Lead OLMoTrace researcher Jiacheng Liu at Ai2's Google Cloud Next booth.
The entrance to the Vertex AI Model Garden at Google Cloud Next.
A QR code leading to the story of Google Cloud and Ai2's partnership sitting near a faux fire pit.
Ai2 COO Sophie Lebrecht talks to visitors at Ai2's booth at Google Cloud Next.
Coming to you live from #GoogleCloudNext Day 2!
๐ Find us at the Vertex AI Model Garden inside the Google Cloud Showcase - try out OLMoTrace, and take a step inside our fully open AI ecosystem.
Ali Farhadi speaking on stage at a fireside chat
"OLMoTrace is a breakthrough in AI development, setting a new standard for transparency and trust. We hope it will empower researchers, developers, and users to build with confidenceโon models they can understand and trust." - CEO Ali Farhadi at tonight's chat with Karen Dahut #GoogleCloudNext
OLMoTrace is powered by my previous work infini-gram, with some innovative algorithmic twists. Really proud to turn an academic research project into a real LLM product, itโs been a truly amazing experience.
Check out infini-gram: infini-gram.io
Try OLMoTrace in Ai2 Playground with our OLMo 2 models: playground.allenai.org
If OLMoTrace gives you new insight into how LLMs behave, weโd love you to share your use case! ๐กTake a screenshot, post the thread link if you like, and donโt forget to tag
@allen_ai
Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data.
We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. โจ
For years itโs been an open question โ how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?
Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. ๐ฆ
๐ฐGoogle Cloud moves deeper into open source AI with Ai2 partnership:
โMany were wary of using AI models unless they had full transparency into modelsโ training data and could customize the models completely. Ai2โs models allow that.โ
(4/4) Searching in OLMo 2's training data is now available in both our web interface and the API endpoint.
Plus, OLMo 2 32B Instruct is a very strong model. Let's do real science with it ๐งช
(3/4) We know the pain point in LLM research in academia: We don't know what's in the training data of these LLMs (GPT, Llama, etc) and what's not; we can only speculate.
So we made the full training data of OLMo 2 and OLMoE searchable, including pre-training and post-training.
(2/4) Check out the source code of infini-gram here: github.com/liujch1998/infini-gram
If you are new to infini-gram, you might want to start with exploring our web interface infini-gram.io/demo and API endpoint infini-gram.io/api_doc
As infini-gram surpasses 500 million API calls, today we're announcing two exciting updates:
1. Infini-gram is now open-source under Apache 2.0!
2. We indexed the training data of OLMo 2 models. Now you can search in the training data of these strong, fully-open LLMs.
๐งต (1/4)
Stay tuned... Wednesday, at #GoogleCloudNext and online ๐
A list of paper authors for 2 OLMo 2 Furious.
Buckle your seatbelt โ we've released the OLMo 2 paper to kick off 2025 ๐ฅ. Including 50+ pages on 4 crucial components of the LLM development pipeline.
kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels ๐ซก
๐ 2 OLMo 2 Furious ๐ฅ is everythin we learned since OLMo 1, with deep dives into:
๐ stable pretrain recipe
๐ lr anneal ๐ค data curricula ๐ค soups
๐ tulu post-train recipe
๐ compute infra setup
๐๐งต
Yes weโve read your paper and thereโs so many interesting findings! Letโs grab coffee at Neurips
(8/n) ... and senior authors @soldni.bsky.social @nlpnoah.bsky.social @mechanicaldirk.bsky.social Pang Wei Koh, Jesse Dodge, Hanna Hajishirzi
(7/n) This work wouldnโt have been possible without my awesome co-first author @akshitab.bsky.social, wonderful colleagues @awettig.bsky.social @davidheineman.com @oyvind-t.bsky.social @ananyahjha93.bsky.social ...
(6/n) Compared to existing work, our method accurately predicts performance on individual tasks, is designed to work on arbitrary overtrained regimes, and is compute-efficient.
Paper link: arxiv.org/abs/2412.04403
(5/n) We have loads of interesting analyses! Check out our paper to find out:
* variance analysis of task accuracy and its impact on prediction error
* impact of using even less compute to make prediction
* ablating the many design choices in our method and exploring alternatives
(4/n) We can predict the task accuracy of OLMo 2 7B and 13B (after pretraining and before mid-training) within an absolute error of 2 points on four tasks โ MMLU, HellaSwag, PIQA, and Social IQa. Error on other tasks is a bit higher, and we aim to improve them in future work.