Peter Bull (@peter.drivendata.org) Bsky

Excited to be speaking at Good Tech Summit in DC April 7 www.goodtechtogether.org/summit

We’ll share a program focused on K-12 education and talk about investing in the foundations of AI: data, models, and benchmarks. We'll explore how these shape AI development in a field. Join us!

2 weeks ago 2 1 0 0

🎉 Excited to launch this challenge! 🎉 Over a year of data collection, curation, and annotation that we undertook to produce a first-of-its-kind dataset.

Help us build speech models that understand 2-5 year olds. $120k in prizes and huge impact!

kidsasr.drivendata.org

2 months ago 0 0 0 0

#SeattleAIWeek 2025 · Events Calendar View and subscribe to events from #SeattleAIWeek 2025 on Luma. Showcasing the PNW as the best place to be in AI. Community-driven. Future-focused. Submit your event now using the + button.

Great set of events for #SeattleAIWeek this week! Definitely join some if you are in town and let me know if you want to catch up luma.com/Seattle-AI-W...

5 months ago 0 0 0 0

🚀 New release: cloudpathlib v0.23.0

🥧 Now with Python 3.14 (π) support!
📁 New copy & move methods mean you can reduce usage of shutil 🎉

Check out the full release and docs here:
👉 cloudpathlib.drivendata.org/stable/

6 months ago 0 0 0 0

Super interesting work on new proposed columnar data file format called F3 with embedded wasm binary to decode the data 🤯 (which obviates the need for 3rd party library support). Favorable comparisons on compression, throughput and random reads to existing formats.

db.cs.cmu.edu/papers/2025/...

6 months ago 0 0 0 0

Very cool to see Wikimedia embracing LLM tools and launching a hybrid similarity search API and open source embeddings for Wikipedia! Also supports Q&A style queries.
www.wikidata.org/wiki/Wikidat...

6 months ago 0 0 0 0

Interesting to see empirical research coming out for LLMs as education aids. In this study, active use of LLMs helped CS students debug compiler errors. Removing LLM access demonstrated no lasting learning benefit from having had access to it...

learninganalytics.upenn.edu/ryanbaker/IC...

6 months ago 0 0 0 0

AI for Conservation Boston Meetup Join us for an iNat bioblitz!!! September 27th from 9am-12pm Meet at umass Boston Quad at 9 Register here https://www.eventbrite.com/e/umass-boston-bioblitz-tickets-1626791971579?aff=oddtdtcreator

Are you interested in #AIforConservation #AIforBiodiversity #AIforWildlife or #AIforNature?? Are you located in the Boston Area?

If so, come join us!! The AI for Conservation Slack community is doing our first local-area Boston meetup, partnering with iNaturalist and TEDx Boston!

7 months ago 8 5 1 1

We just shipped two major features for cloudpathlib ✨📦 ✨ ! First, http support—treat an URL like any other path (open, read_text, join). Second, compatibility with open and os Python built-ins for seamless transition of legacy code and third-party library support.

cloudpathlib.drivendata.org

6 months ago 0 0 0 0

Job Bulletin State of North Carolina

Great opportunity to work on AI in conservation and biodiversity with Roland Kays! In-person in NC, check it out now since it is only open for a week:
www.governmentjobs.com/careers/%7B0...

6 months ago 0 0 0 0

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Exemplary FAQ for "Your Brain on ChatGPT: Accumulation of Cognitive Debt" www.brainonllm.com/faq

I'd love to see more authors who are explicit about what NOT to claim based on a study, including wording for lay audiences that is not appropriate.

7 months ago 0 0 0 0

Thought I would spot check a application someone was posting about 100% vibecoding. Can you spot the issue?

Kudos to the LLM, this is verbatim from the fastapi docs. Sometimes verbatim from the docs is not what you want for your application though....

8 months ago 0 0 0 0

Interesting announcement on a product from Astral! Similar model to one of the core @anacondainc.bsky.social lines of business.

8 months ago 1 0 0 0

Enthusiastic to build on this generation of earth observation foundation embeddings like DeepMind's AlphaEarth (and more)! We already see some promising crop type (cereals vs. orchards) results and are exploring other use cases in climate resilience. deepmind.google/discover/blo...

8 months ago 0 0 0 0

File Browser - marimo The next generation of Python notebooks

Very cool to see that marimo supports our cloudpathlib library for their file browser UI! Browse your S3, GCS, Azure buckets from your notebooks! docs.marimo.io/api/inputs/f...

8 months ago 0 0 0 0

✨ 📦 ✨ Just released new Cookiecutter Data Science version with support for pixi and poetry as environment managers! Some of our top requested features ever. Upgrade and check it out now.

cookiecutter-data-science.drivendata.org

8 months ago 1 0 0 0

Now getting organic inbound for www.zambacloud.com, our wildlife imagery processing platform, from ChatGPT! 😲

8 months ago 0 0 0 0

Just in case you thought speech-to-text worked for children, the third column is what Whisper does. Somehow in the third example it accesses my inner monologue... I guess that's why we're excited about our upcoming challenge! kidsasr.drivendata.org

9 months ago 1 0 0 0

How are people managing code review for their AI coding agents? I do a first glance and it is obviously bad (e.g., didn't refactor repeated code), and now I've got half a dozen AI diffs for things that aren't good enough cluttering up my todo list with things to respond to....

9 months ago 0 0 0 0

Time is On My Side: Dynamics of Talk-Time Sharing in Video-chat Conversations An intrinsic aspect of every conversation is the way talk-time is shared between multiple speakers. Conversations can be balanced, with each speaker claiming a similar amount of talk-time, or…

New research based on the CANDOR corpus shows that people enjoy conversations where they alternate longer turns better than short turns or one person dominating. Cool!

arxiv.org/html/2506.20...

9 months ago 0 0 0 0

The best shortcut to how many experienced software engineers feel about AI is listening to the Primeagen's takes. Balanced perspectives on what's actually new, determinism, security, system complexity, what's promising, and what's not www.youtube.com/watch?v=vDWa...

9 months ago 0 0 0 0

Maldito ChatGPT Camilo · Maldito ChatGPT · Song · 2025

"Damn ChatGPT" your new summer jam about using ChatGPT as a therapist open.spotify.com/track/4umq06... (edited)

9 months ago 0 0 0 0

How Long Contexts Fail Taking care of your context is the key to building successful agents. Just because there’s a 1 million token context window doesn’t mean you should fill it.

Great article on the challenges of only surfacing the right info to LLMs and editing down what is not needed. If you've used a coding copilot or agent, you've seen this first hand many times. Output iterations are often polluted with code that came before.

www.dbreunig.com/2025/06/22/h...

9 months ago 0 0 0 0

BioCLIP2 looks like a stellar improvement! I'm excited to think about integrating into Zamba to for open-ended classification tasks run at scale on camera trap imagery. Definitely the potential to dramatically improve CT image utility. imageomics.github.io/bioclip-2/

9 months ago 0 0 0 0

Inside the AI Prompts DOGE Used to “Munch” Contracts Related to Veterans’ Health Experts who reviewed the code for ProPublica found numerous and troubling flaws in the system, providing a disturbing glimpse into how the Trump administration is allowing artificial intelligence to…

"Munchable" is GenZ cringe. www.propublica.org/article/insi...

9 months ago 0 0 0 0

We've built so many low-fidelity prototypes in our HCD work. IMO vibecoding changes the feel of those prototypes, but doesn't change the process. Ask any designer—they'll tell you high-fidelity first iterations are often more distracting to clients than helpful.

www.semafor.com/article/06/0...

9 months ago 0 0 0 0

Check out this LLM circuit trace LLM for the text: '"The statement 'this statement is false' is." It goes through a logical contradictions node, but still outputs either "true" or "false" with the highest probabilities... www.anthropic.com/research/ope...

9 months ago 1 0 0 0

A new preprint shows anonymization techniques for voices make transcription accuracy substantially worse for children versus adults. This is going to be a big challenge as we work on ASR for educational settings where we emphatically need both privacy and accuracy. arxiv.org/pdf/2506.00100

9 months ago 0 0 0 0

30 minutes with a stranger Watch hundreds of strangers talk for 30 minutes, and track how their moods change

😍 Incredible data storytelling about the power of conversation and human connection. Worth a read for good vibes! Based on the CANDOR corpus that we worked on. pudding.cool/2025/06/hell...

10 months ago 0 0 0 0

The gap between LLM prototype and production strikes again... in the worst possible place. www.propublica.org/article/trum...

10 months ago 0 0 0 0

Posts by Peter Bull