But I fear I'm far too old and slow to be this guy. I don't even *like* Red Bull.
Posts by Jeff Smith
I really am better suited to be this guy. I have the sweater vest and everything.
So, you don't feel like Ludwig when you're done. You just feel like you played as fast as you could but the game can still be played so much faster and there's probably a teenager in Korea who could mop the floor with you.
But with agentic coding, basically being a dev is just Starcraft with shittier graphics. You barely even notice all of the wins, because the model is solving all of the puzzles. You just wind up typing as fast as you can. It's pure video games.
Back in the day, being a dev used to be like being Ludwig from that David Mitchell show. You should just pace and whiteboard and ponder until you solved the puzzle. And you felt really smart when you did. It was a big part of the appeal for a lot of us.
The basic idea of hash tables is that “the universe is a big place, but it’s mostly empty."
Hash tables, or how to leverage the sinking feeling of loneliness you get when you look at the sky
The new “This could have been an email” is when someone does interesting research and the only way to consume it, is on YouTube
“This could have been a blog post!”
Next #ATmosphereConf should be held in Europe
Required reading for everyone following @kissane.myatproto.social’s awesome #AtmosphereConf keynote.
Loved every moment of @kissane.myatproto.social keynote at #ATMosphereConf .
I need HOLD FAST merch stat. Sweatshirt, totes, and best of all: gloves.
#KelpFacts
Which is why, although all John Luther Adams music will put you to sleep, all John Adams music will disturb your rest forever.
John Adams music is like if you put a car factory on the back of a semi. And then the driver had intrusive thoughts.
For the ARC-AGI-3 benchmark test, the developers made interactive puzzle games sherwood.news/tech/the-tou...
Claude is soooo slowwwwwwww when America is awake
go back to sleep, y'all
DeepMind's RL team is hiring a research scientist: if you're passionate about RL, come work with us!
And if you know people who might be interested, please share:
job-boards.greenhouse.io/deepmind/job...
@martin.kleppmann.com talking to @qconferences.com London about @bsky.app and #ATproto .
I'll be at #QCon London tomorrow talking about this. Come find me if you're working on open source review tooling or contributor trust. #oss #genAI #codingAI
We're also working on the cold-start problem. Scoring new contributors LOW is accurate but not useful. The next step is tooling that helps first-time contributors understand a project's expectations before they submit.
Where we're headed: contributor scoring tells you who someone is. The harder question is whether a specific PR fits the target repo. We've seen strong signal in repo-specific fingerprinting and we're building tools around it.
We also pulled account age out of the score and into a separate advisory. The score now means one thing. Account age is context alongside it, not blended in.
New default: one ratio. Directly interpretable. If a contributor has a 78% merge rate, that's the score. No graph construction, no regression coefficients.
That pushed us to question the scoring model. The graph score (the most complex part of the system) actively hurt predictions for unknown contributors. Merge rate alone outperforms the full model at every tier.
We tried to detect suspended GitHub accounts from behavioral signals. LLMs, network analysis, title patterns. None of it worked on contributors who'd gotten code through review. They look like everyone else. The merge process itself is the filter.
We rebuilt Good Egg's scoring model from the ground up. Simpler, faster, more accurate for the contributors who actually need scoring. Here's what we learned and where we're headed. 🧵
Full methodology, all scoring data, and the failures are published alongside the successes. github.com/2ndSetAI/goo...
v1 and v2 have identical AUC (0.647). We shipped v2 anyway because merge rate corrects survivorship bias and account age stabilizes sparse graphs. Both carry confirmed statistical signal. The flat AUC just means the graph already captures most ranking information.
We tested seven features on 5,129 PRs across 49 repos. Three survived. Most interesting failure: text similarity between PR descriptions and project READMEs. Higher similarity predicted lower merge rates. We think low-effort PRs parrot project language.
The case that motivated it: Guillermo Rauch scores MEDIUM against his own company's Next.js repo. Zero merged PRs in Next.js itself. v2 factors in his 17.7-year account and 78% merge rate, pushing him to HIGH.