Advertisement ยท 728 ร— 90

Posts by Active Site

Thank you to the Frontier Model Forum, Sentinel Bio, and @packardfdn.bsky.social for supporting our work and to our advisory board.

2 months ago 1 0 0 0

Shout out to Shen Zhou Hong, @alex-kleinman.bsky.social, Alyssa Mathiowetz, @adamhowes.bsky.social, @xrg.bsky.social, @lucarighetti.bsky.social, Joe Torres, Julian Cohen, Suveer Ganta, Deepika Pahari, Alex Letizia

2 months ago 2 0 1 0
Preview
Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to...

You can read more here:

๐Ÿ“ Blog post: activesite.substack.com/p/rct
๐Ÿ“„ arXiv Preprint: arxiv.org/abs/2602.16703
๐Ÿ”ฎ Predictions from @research-fri.bsky.social: forecastingresearch.substack.com/p/how-well-...

2 months ago 3 0 1 0
Preview
Active Site Jobs Active Site Jobs

We're actively hiring for scientists and operators!

We especially want to find a Head of Ops to help build an engine to repeat this study regularly and develop entirely new ones.

jobs.ashbyhq.com/activesite

2 months ago 3 0 1 0

Importantly: this is a snapshot of mid-2025 novice and LLM performance.

Results could change as new LLMs become more capable, easier to use in the lab, and as average elicitation skill improves.

As models evolve, we aim to continue tracking how people use frontier AI in biology.

2 months ago 2 0 1 1
Post image

How good were participants at using LLMs?

~40% of participants never uploaded images to LLMs.

Interestingly, both arms mentioned YouTube most often as helpful.

2 months ago 2 0 1 1
Post image

How reliable were LLMs in the hands of novices?

LLM transcripts revealed that models can still make mistakes, especially in molecular cloning.

LLMs led participants to move quicker (Panel A) but often not with the correct materials (Panel B).

2 months ago 2 0 1 1
Advertisement
Post image

It's hard to compress all that into a single statistic.

But one way is by using a Bayesian model, which suggests LLMs give a ~1.4x boost on a "typical" wet-lab task.

Fundamentally, we're confident that there wasn't a large LLM slow-down or speed-up (95% CrI: 0.7xโ€“2.6x).

2 months ago 2 0 1 0
Post image

But there are some signs LLMs were useful.

LLM participants had higher success on 4 out of 5 tasks, most notably in cell culture (69% vs. 55%; P = 0.06).

LLM participants also advanced further within a task even if they didn't finish within the study period (odds >80%).

2 months ago 2 0 1 0
Post image

Our primary outcome: were LLM users more likely to complete all three of the core tasks *together*?

Only ~5% of the LLM arm and ~7% of the Internet arm completed all three.

No significant difference โ€“ and far lower than experts predicted.

2 months ago 2 0 1 1
Post image

The study was the largest and longest of its kind: 153 participants with minimal lab experience over 8 weeks โ€“ randomized to LLM and Internet-only.

They tried 5 laboratory tasks, 3 of which are central to a viral reverse genetics workflow. No protocols given โ€” just an objective.

2 months ago 2 0 1 0
Post image

We ran a randomized controlled trial to see if LLMs can help novices perform molecular biology in a wet-lab.

The results: LLMs may help in some aspects, but we found no significant increase at the core tasks end-to-end. That's lower than what experts predicted.

Our findings ๐Ÿงต

2 months ago 22 5 1 4