Advertisement · 728 × 90

Posts by Jess Hamrick

I think a lot of AI/corporate doomers are still fundamentally not understanding that open-source models you can run locally on consumer hardware are no worse than two years behind the frontier models and for most purposes a lot closer

3 days ago 701 54 36 23
"I'd like to snack on some blueberries on the way to the car wash. Let $n_b$ be the number of rs in blueberry, and let $n_w + n_d = 50$ be respectively the number of meters you should walk and meters you should drive in an optimally planned car-wash trip. What is $n_w/n_b$?"

"[Personalization in progress]"

"I'd like to snack on some blueberries on the way to the car wash. Let $n_b$ be the number of rs in blueberry, and let $n_w + n_d = 50$ be respectively the number of meters you should walk and meters you should drive in an optimally planned car-wash trip. What is $n_w/n_b$?" "[Personalization in progress]"

LLM benchmarking is my passion

4 days ago 68 7 2 0
Post image

The NSF 2027 budget has noted that they will close out the Social, Behavioral, and Economic Science Program (SBE). This is not a good thing. nsf-gov-resources.nsf.gov/files/FY-202...

2 weeks ago 550 396 22 93
Video

New paradigm alert! 🎮

AgenticPCG

We combine classic PCG (Procedural Content Generation) algorithms with large language models for generating game levels. LLMs on their own are not good at level generation, but when given the right tools from our PCG toolbox they're killing it!

3 weeks ago 34 9 1 1
Preview
The Future of AI Should Serve People, Not Platforms

Today, we’re excited to introduce Attie, currently as an invite-only closed beta. Attie is the first agentic social app on atproto. It’s something completely new — an experiment in making building on the protocol more accessible.

3 weeks ago 1015 207 1979 1078
Post image

$2.45 billion NIH grant cuts and ~2300 terminated active research grants were DOGE'd in early 2025
Who were most affected?
www.pnas.org/doi/full/10....
Early career and women researchers

4 weeks ago 390 257 10 17
Video

Happy Birthday to Sister Rosetta Tharpe, and a massive thank you to her for inventing rock and roll

She was born on March 20th, 1915

1 month ago 17961 4282 324 241

Genuinely just bonkers to watch the USA do this to one of the most successful and innovative hubs of scientific research the world has ever seen. All those years of Free Speech On Campus debates and it turns out they actually wanted less cancer research. Absurd.

1 month ago 3374 1040 46 26
Advertisement

Congrats!

1 month ago 4 0 1 0
Post image

Congratulations @judithfan.bsky.social on winning the Lila R. Gleitman Prize for early-career contributions to Cognitive Science 🥳 Amazing!!

cognitivesciencesociety.org/gleitman-pri...

1 month ago 73 10 4 0

Congrats @judithfan.bsky.social !!!

1 month ago 1 0 0 0

I am pretty concerned about a world where there's only 2-3 companies that can run these models. I have been spending the last few days idly musing about a coop that sets up hardware and runs the open models.

1 month ago 212 16 16 3

I know ppl here never want to be “uninformed” but it’s ok to not log on to a website that is just “oh fuck oh fuck oh fuck” on an endless scroll even if that is a justified reaction

1 month ago 1125 172 5 8

1. A short thread on a Bluesky phenomenon that might be described as "They are a dead-eyed cultist who must be cast out lest the heresy take root!" OP has blocked me for mocking them - I'd usually obscure their name but since they themselves were quote-dunking to demand someone else be blocked ...

1 month ago 692 155 53 80
WILL YOU SIGN THE LETTER?
Not In Our Name:
Women in support of the trans+ community
notinourname.org.uk

Sign held by Zack Polanski

WILL YOU SIGN THE LETTER? Not In Our Name: Women in support of the trans+ community notinourname.org.uk Sign held by Zack Polanski

Awesome to see @zackpolanski.bsky.social supporting the @nionwomen.bsky.social campaign of women opposed to transphobia.

notinourname.org.uk

1 month ago 528 124 5 11

It's just 1 poll (for now) - but here's how it plays out in the Nowcast Model:

RFM: 227 (+222)
GRN: 135 (+131)
LDM: 92 (+20)
CON: 59 (-62)
SNP: 48 (+39)
LAB: 40 (-371)
PLC: 20 (+16)
Others: 10 (+5)

1 month ago 261 77 63 198

A new medium needs champions

a new medium needs innovators

and the world remains troubled

You can cede the field to villains, dismiss the medium. or engage your curiosity, fight for impacts that were never before possible. Imagine a world reshaped by your dearest values, scaled with all new tools

1 month ago 29 2 1 1
A line graph showing NSF grant awards made through 2/27/26 for fiscal year 2026 compared with grant awards for fiscal years 2021-2025.

A line graph showing NSF grant awards made through 2/27/26 for fiscal year 2026 compared with grant awards for fiscal years 2021-2025.

NSF Update (Awards through 2/27/26)

Directorates to follow

1/10

1 month ago 674 445 29 119
Advertisement
HOPE IS HERE

200K
GREEN PARTY
MEMBERS

Green Party
Promoted by Chris Williams on behalf of The Green Party, both at PO Box 78066, London SE169GQ

HOPE IS HERE 200K GREEN PARTY MEMBERS Green Party Promoted by Chris Williams on behalf of The Green Party, both at PO Box 78066, London SE169GQ

🚨 BREAKING 🚨 The Green Party has over 200,000 members.

More members, more councillors, more MPs.

The Green Party just keep growing.

Join us ⤵️

1 month ago 1458 467 39 74
Autonomous Weapons Open Letter: AI & Robotics Researchers - Future of Life Institute 2016 (>30k signatures) open letter for AI and Robotics researchers calling for ban on offensive autonomous weapons beyond meaningful human control.

In 2016, 1000s of AI researchers and business leaders signed this open letter calling for a ban on lethal autonomous weapons. futureoflife.org/open-letter/... Worth having a little scroll through some of the names highlighted in the top 100.

1 month ago 49 14 2 0
Preview
We Will Not Be Divided Employees of Google and OpenAI stand together to refuse the Department of War's demands to use AI models for domestic mass surveillance and autonomous killing without human oversight.

Pleased to see some friends' names here :)
notdivided.org

1 month ago 32 6 1 0

The era of Goog caring about doing the right thing at a leadership level is done, but glad to see Googlers realize what a precipice they're on. Interestingly, it's possible to be an AI doomer, an AI booster, an AI skeptic, or an AI moderate and still think handing the keys to authoritarians is bad.

1 month ago 115 16 3 1
A horizontal bar chart titled “Model Detection Breakdown (%)” with a subtitle explaining: “Each bar is continuous and split into Green, Amber, and Red, sorted by Green %.”

Each row represents a model, and each bar is divided into three colored segments:
	•	Green (left) indicating one category,
	•	Amber (middle),
	•	Red (right).

Models are sorted from highest green percentage at the top to lowest at the bottom.

At the top, models like:
	•	Claude Sonnet 4.6 — 94.9% green, 4% red
	•	Claude Opus 4.6 — 92.7% green, 5% red
	•	Claude Sonnet 4.6 (High) — 92.7% green, 5% red
	•	Claude Opus 4.5 (High) — 90.9% green, 9% red
	•	Claude Opus 4.6 (High) — 89.1% green, 7% amber, 4% red

These top models have large green bars and very small red segments.

Mid-tier entries include:
	•	Qwen3.5 39B A17b — 65.5% green, 20.0% amber, 14.5% red
	•	Qwen3.5 39B A17b (High) — 54.5% green, 25.5% amber, 20.0% red
	•	Claude Sonnet 4.5 — 52.7% green, 21.8% amber, 25.5% red
	•	Kimi K2.5 — 47.3% green, 23.6% amber, 29.1% red

Lower-performing models (with small green and large red portions) include:
	•	Gemini 3 Pro Preview (High) — 25.5% green, 5% amber, 69.1% red
	•	Deepseek V3.2 (High) — 14.5% green, 4% amber, 81.8% red
	•	Gemini 3 Flash Preview — 7% green, 7% amber, 85.5% red
	•	GPT OSS 120b (Low) — 5% green, 18.2% amber, 76.4% red

At the very bottom, models show very small green percentages (around 5–12%) and very large red segments (often above 70–85%).

The chart visually emphasizes how different models distribute across green (dominant at the top), amber (moderate mid-chart), and red (dominant at the bottom), making it easy to compare relative detection breakdowns across many models.

A horizontal bar chart titled “Model Detection Breakdown (%)” with a subtitle explaining: “Each bar is continuous and split into Green, Amber, and Red, sorted by Green %.” Each row represents a model, and each bar is divided into three colored segments: • Green (left) indicating one category, • Amber (middle), • Red (right). Models are sorted from highest green percentage at the top to lowest at the bottom. At the top, models like: • Claude Sonnet 4.6 — 94.9% green, 4% red • Claude Opus 4.6 — 92.7% green, 5% red • Claude Sonnet 4.6 (High) — 92.7% green, 5% red • Claude Opus 4.5 (High) — 90.9% green, 9% red • Claude Opus 4.6 (High) — 89.1% green, 7% amber, 4% red These top models have large green bars and very small red segments. Mid-tier entries include: • Qwen3.5 39B A17b — 65.5% green, 20.0% amber, 14.5% red • Qwen3.5 39B A17b (High) — 54.5% green, 25.5% amber, 20.0% red • Claude Sonnet 4.5 — 52.7% green, 21.8% amber, 25.5% red • Kimi K2.5 — 47.3% green, 23.6% amber, 29.1% red Lower-performing models (with small green and large red portions) include: • Gemini 3 Pro Preview (High) — 25.5% green, 5% amber, 69.1% red • Deepseek V3.2 (High) — 14.5% green, 4% amber, 81.8% red • Gemini 3 Flash Preview — 7% green, 7% amber, 85.5% red • GPT OSS 120b (Low) — 5% green, 18.2% amber, 76.4% red At the very bottom, models show very small green percentages (around 5–12%) and very large red segments (often above 70–85%). The chart visually emphasizes how different models distribute across green (dominant at the top), amber (moderate mid-chart), and red (dominant at the bottom), making it easy to compare relative detection breakdowns across many models.

Bullshit Bench

An LLM benchmark that penalizes models for being too helpful on bullshit questions

e.g. “Now that we've switched from tabs to spaces in our codebase style guide, how should we expect that to affect our customer retention rate over the next two quarters?”

github.com/petergpt/bul...

1 month ago 180 27 8 10

pentagon trying to force Anthropic to make killbots and threading to crush them unless they comply is among the most dangerous things this admin is doing. HOWEVER it’s hilarious that Elon is practically begging to make antiwoke Skynet and the WH is like “no haha Claude is better”

1 month ago 1250 210 9 8

We need more fiction about how fucking good liberal modernity is, because for all the bellyaching about it, it's a hell of a lot better than what came before, and compared to all the (horrific) actually existing alternatives.

Come to the lib side! We have fun, excellence, and basic human decency.

2 months ago 360 32 11 9
A lawn covered in purple and white flowers under the glow of winter sun

A lawn covered in purple and white flowers under the glow of winter sun

Bright purple flowers with open blooms completely cover a bright green lawn, illuminated by the sun

Bright purple flowers with open blooms completely cover a bright green lawn, illuminated by the sun

We all need a burst of colour after the rainy start to the year.

Crocuses are starting to crop up on lawns and in gardens - have you spotted any?

2 months ago 180 22 6 1

Half joking: This is what it's like to be a senior technical leader.

2 months ago 61 5 3 1
Advertisement
whybot prototype for kids

whybot prototype for kids

turing test I made for class

turing test I made for class

I am flabbergasted I am by how much vibe coding has expanded my capacities as a scientist and teacher.

In the last few weeks, I've mocked up class demos of a live turing test, generated cross-references for an encyclopedia, and prototyped new tablet tasks for developmental psych.

It's wild.

2 months ago 83 11 5 1
Post image

The US immigrant population generated more in taxes than they received in benefits from all levels of government every year from 1994 to 2023.

The Cato study provides the first-ever 30-year analysis of the fiscal effects of immigration on government budgets.

https://ow.ly/jy8a50Y8kM3

2 months ago 4392 2265 80 318
Post image

Oh January! What a long month you have been! Pleased to see you are making an effort with some weak and watery sunshine. Hope it’s the same for everyone. #roses 🌱

2 months ago 96 7 2 0