it’s OCR week! learn how we use verifiable rewards against unit tests to improve olmOCR’s PDF understanding
state of the art OCR, fully open model:
Posts by Luca Soldaini 🎀
yea i was gonna link 🤣 rough guidelines I’ve heard for multilingual are around 600B+, which high level matches yuval’s findings.
best commute on earth
which one of you im gonna have the pleasure to see at COLM???
my keystrokes go though light-up starry cable
OF COURSE my code is better than yours
12+ years in this country, first time I get to wear this sticker 🗳️
wearing italian camo* at ICML
*ordering an ice lattes rather than espressos at coffee shops
new @ai2.bsky.social office has something for everyone: stunning views for the outdoorsy kind, 2.5 Gbps connection at every desk for the indoor nerds
Waymo is cool but BART from SFO to downtown SF is cooler
101 can be as dark red as you want on google maps!
babyyyyy
text classification at scale, works great on 70TB of text
scales just fine to 70TB of text, supports subword embedding, someone made rust bindings 😌
no reason to switch just because the software is no longer updated. compile from scratch, works great!
2025 AI hot take: everyone should use FastText more. Word embeddings are awesome.
congratulations!!
Reddit also has deals with OpenAI and GDM. Maybe negotiation stalled with Anthropic.
they are a joy to type with our loud mechanical keyboards
today might be rainy, but PNW summer is already here
if soldering skills become critical i’m gonna be soon out of a job 😅
I've silenced all notifications on all my devices and it's truly the best thing ever
...I am considering allowing calendar notifications tho cuz I almost missed 3 meetings already 😅
two weeks traveling and I miss my mechanical keyboard so much
MANGO SMOOTHIE
don’t forget da smoothie 🤤
when someone says they wanna bring me to their favorite italian restaurant
I am still perpetually in awe that skill emergence exists in language models
million of caveats but we have models that pick up capabilities from plain text???
it's so magical, I can't believe we got such treat
congrats!!! amazing news 🥰
mittens!
PAWS!
bluesky deserves to know we’ve adopted a cat and he’s the most handsome boy
Summary of our recommendation we submitted to White House to ensure success of open & transparent AI
As a meta point, I’m very grateful to be in a position where I can put my technical expertise in the service of policy needs 🥰
with so many good vision models out there, choosing a starting model with a friendlier license is better ☺️
Gemma is a great generalist model, but any competent VLM model is a good starting point for olmOCR