Kyle Lo (@kylelo) Bsky

oh lol ppl have been submitting wout reviewing forever, TIL it was boycotting all along

2 weeks ago 2 0 1 0

kinda out of the loop, ppl are submitting to neurips but not reviewing?

2 weeks ago 1 0 1 0

thanks for the support!

3 weeks ago 1 0 0 0

thanks maria! glad got to share a fun office and collaborate during s2 days! appreciate can both chat abt difficult research problems but also peak taste tv shows w u 😆 will be in touch!!

3 weeks ago 2 0 0 0

Today I'm saying farewell to @ai2.bsky.social.

I'm so proud of our team & grateful to have shared fully-open Olmo, Dolma, olmOCR, Molmo, etc with the world

I know the team is more committed than ever to advancing open-source & open-science. Forever rooting for my dear friends 🫶

3 weeks ago 54 1 3 0

cs peer review atm feels like im in a user study that forgot to get irb review

3 weeks ago 4 0 1 0

lololol I subscribe to the @mariaa.bsky.social school of cozy figures

3 weeks ago 3 0 1 0

for figs/diagrams, ive been found nano banana generates images a bit too cringe-tech for me, have had some success w committing to images all in matplotlib code, one script per fig

3 weeks ago 2 0 1 0

nice post! will need to check out reveal. some of my colleagues and i have a similar workflows using markdown instead of html, but the idea of some structured doc that is in-distribution for LMs seems the right path

3 weeks ago 2 0 1 0

Introducing Olmo Hybrid: Combining transformers and linear RNNs for superior scaling | Ai2 Ai2, a non-profit research institute founded by Paul Allen, is committed to breakthrough AI to solve the world’s biggest problems.

big congrats to @lambdaviking.bsky.social for leading this project & core contributors Yanghong Li
@tylerromero.bsky.social
@anejsvete.bsky.social
Caia Costello

blog: allenai.org/blog/olmohyb...
paper: allenai.org/papers/olmo-...
hf collection: huggingface.co/collections/...

1 month ago 3 1 0 0

our new Olmo Hybrid model combines attention with linear RNN layers

🍣training efficiency is crazy good. the model reaches same MMLU score as Olmo 3 in 50% of the tokens. also see this in many other tasks

as always: weights, data, ckpts, training code, etc. all fully open

1 month ago 36 4 1 2

DrawEduMath is our benchmark testing VLM understanding of K-12 student math work, which is prerequisite for their use in educational contexts

one year after, while VLMs are strong math solvers today, they still underperform on our bench, esp for students who need the most help

1 month ago 3 0 0 0

Olmix: A framework for data mixing throughout LM development | Ai2 Olmix is a framework for language model data mixing that provides empirically grounded defaults and efficient reuse techniques.

this work was led by our intern Mayee Chen and was one of the new ideas we adopted into Olmo 3!

blog post: allenai.org/blog/olmix
arxiv paper: arxiv.org/abs/2602.12237

2 months ago 3 0 0 0

one of my favorite topics is dealing with data constraints!

what if your proposed mix is 30% code but you don't have enough code? we can repeat our data until we hit target proportions, but too much is risky

we view data mixing as (data) constrained optimization

2 months ago 3 0 1 0

our paper on data mixing for LMs is out!

while building Olmo 3, we saw gaps between data mixing literature and real practice

🐠choosing proxy size, # runs, sampling, regression, constraints..
🐟data shifts during LM dev: can we reuse past experiments?

Olmix tackles them all!

2 months ago 29 4 1 0

literally all the time 😮‍💨 this was yesterday

2 months ago 3 0 0 0

learning how to do something is a first-order use case for LMs, the development bottleneck has been collecting data covering a wide diversity of topics, until now ✌🏻

2 months ago 2 0 0 0

incredibly fun project led by our intern yapei chang

we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.

2 months ago 28 3 1 0

lol rip 😮‍💨

It’s like a score calculated against gold reference citations in generated lit review, so even humans don’t score high. i think the eval is saturated cuz so much subjectivity in what counts as appropriate citation. better phrasing is maybe that the citations are sensible up to some X

2 months ago 4 0 0 0

they’re separate poorly named systems lol 😂 Separate projects approaching same problem from different angles. Scholar QA approach from agentic system design, use whatever model. Ope Scholar approach from model-first, very light on system. The teams are working together to fuse ideas

2 months ago 1 0 0 0

our open model proving out specialized rag LMs over scientific literature has been published in nature ✌🏻

congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers

www.nature.com/articles/s41...

2 months ago 43 10 2 2

0 days since last mixup of eval results between "copa" (choice of plausible alternatives) & "coqa" (conversational QA) tasks 😐

2 months ago 4 0 0 0

The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...

2 months ago 21 7 0 1

mm yea i think that's always the case w productivity tools.

imo ability to adopt new tools is core part of the job. just like transition from plain text editors to IDEs, from sending files via FPT to using git for collab, from ad hoc Makefiles to package managers, etc. AI is just the latest thing

3 months ago 3 0 0 0

my concern is the growing pool of "unknown unknowns" as i interact less with code directly.

imo probably why i subconsciously have been leaning toward cursor over claude code or similar agents, even if the latter has a higher code-to-keystrokes ratio

3 months ago 7 0 0 0

i dont feel worse at this even if im not writing papers from-scratch as much as during early career

but coding feels different due to mismatch between what i express to the system (english) and what the system returns (code). i've already realized some gaps in libraries I used to know well.

3 months ago 8 0 1 0

whether my ability to review code will degrade as I offload increasingly larger workloads to AI

of course, this shift is present in other forms of generation, like paper writing, where my role has shifted to reviewing/editing (student's) drafts.

3 months ago 6 0 1 0

some thoughts about skill degradation w/ AI coding

im onboard w views that "english is the new programming language" & "software engineering", translating ambiguous goals to technical specs/execution, is still a skill.

im more concerned w shift from my role as a writer to a reviewer and

3 months ago 15 0 2 0

lucky to chat w sen. patty murray about olmo & importance of fully open AI

3 months ago 50 1 2 0

using opus to extract research topics from papers & it was giving me useless words like "training", "datasets", and "evaluation"

kept prompting it w examples of more informative topics and it ended up with "LLM training", "LLM datasets", and "LLM evaluation"

thx

3 months ago 13 0 3 0

Posts by Kyle Lo