Excited to be speaking at Good Tech Summit in DC April 7 www.goodtechtogether.org/summit
We’ll share a program focused on K-12 education and talk about investing in the foundations of AI: data, models, and benchmarks. We'll explore how these shape AI development in a field. Join us!
Posts by DrivenData
The performance gap in children’s ASR is real — and solvable!
With 1 week until the April 6 deadline, we’re inviting the global ML community to help close it.
Compete for $120K and contribute to better speech technology for kids.
kidsasr.drivendata.org
Children’s speech remains one of ASR’s toughest challenges — and the leaderboard is moving!
3 weeks left to compete for $120K in the On Top of Pasketti Challenge.
Submit your model:
kidsasr.drivendata.org
10 years ago, "data science for social good" was just an idea. Today, it's a global movement.
Our 10-Year Impact Report reflects on a decade of responsible, real-world AI.
Take a look back with us: s3.amazonaws.com/drivendata-p...
Competing in the On Top of Pasketti Word Track? We've published a reference tutorial walking through how to build a model for children's speech recognition — covering data exploration, model training, and submission packaging.
Get started: drivendata.co/blog/child-a...
From Bogotá to Almaty, our community continues to impress!
IGCPHARMA built prize-winning early dementia prediction models in PREPARE.
Kirill Brodt has 23 competitions, 8 top-10s, and 6 top-3 finishes.
Two new Community Spotlights:
drivendata.co/blog/communi...
drivendata.co/blog/communi...
Automatic speech recognition struggles with children’s speech. That gap matters!
The $120K Children’s Speech Recognition Challenge is driving progress toward models that truly understand kids.
Join breakthrough! Submit your solution by April 6th.
kidsasr.drivendata.org
A machine learning competition for NASA sparked something bigger.
CyFi (cyanobacteria finder) is now an open-source tool using Sentinel-2 data to monitor harmful algal blooms worldwide, with local validation underway.
How we got here: drivendata.co/blog/cyfi-sm...
What happens when AI agents enter ML competitions?
Spoiler: Humans 1, Agents 0.25–0.93.
The top of the leaderboard — where “good” becomes “great” — still looks very human.
How might that change?
See the results:
drivendata.co/blog/ai-agen...
We’re launching our first benchmark competition.
The SNOMED CT Entity Linking Benchmark evaluates how well models structure clinical notes using the SNOMED CT nomenclature for a large, de-identified dataset of annotated records.
Help set the baseline: www.drivendata.org/benchmarks/3...
Building data pipelines is deceptively complex; Cold starts, unstable inputs, shifting requirements, and delivery trade-offs create friction at every step.
To ease the pain, we examined five recurring challenges and suggest practical improvements: drivendata.co/blog/pipelin...
ML competitions moved fast in 2025 - from 512-GPU training runs to benchmark-style challenges.
The new State of Machine Learning Competitions report explores what winning teams used.
We're grateful to be part of such a dynamic ML community.
Read:
mlcontests.com/state-of-mac...
The gap between public data and usable data is the “last-mile data problem.”
We’ve all seen it: confusing CSVs, messy schemas, opaque data dictionaries.
We’re testing a “baked data” approach to solve it, with promising results.
See our recipe:
drivendata.co/blog/last-mi...
We gave AI agents 24 hours on the leaderboard.
Claude Code (Opus 4.5) and Codex (GPT 5.2) produced dozens of submissions. Some hit the top 20. Others overfit or plateaued.
We identified 9 obstacles and 6 open questions.
Do they match your experience?
drivendata.co/blog/ai-agen...
Voice-based tools have the potential to support learning, accessibility, and early literacy, but only if they work for children. In the $120k On Top of Pasketti Children’s Speech Recognition Challenge, solvers will build ASR systems that understand kids. kidsasr.drivendata.org
Hang out with us at #SciPy2026 this summer! Senior Data Scientist Katie Wetstone is co-chairing the Environmental, Earth, and Climate Sciences track, which you can submit to by February 25.
We're excited to be a part of the K-12 AI Infrastructure Program, advancing open datasets, models, and benchmarks for AI in teaching & learning. The first RFP is now open ($50K–$250K) - apply now! k12-ai-infrastructure.org/rfp-due-marc...
Kids learn through voice, but today's ASR tech can barely understand them. In a new data science challenge, solvers will develop models that work with children's unique speech patterns and compete for a share of the $120k prize pool. kidsasr.drivendata.org
🚨 New opportunity: Help build open-source speech recognition AI 🎙️📚
@drivendata.org is hosting a data science competition to advance automatic speech recognition (ASR) for early education. Two tracks, real impact, and $120K in prizes.
Learn more & compete: kidsasr.drivendata.org
DrivenData's Katie Wetstone will be co-chairing the climate sciences track at @scipyconf.bsky.social, where you can share YOUR work in environmental data science! Submit a #SciPy2026 talk, tutorial, or poster by February 25. See you there!
The $10k prize pool Poverty Prediction Challenge sponsored by The World Bank tackles a major challenge in development research: How do you estimate current poverty rates without recent household expenditure data? Submissions open through midnight UTC February 4, 2026.
In our newest machine learning challenge, solvers will use survey data and help uncover imputation methods for monitoring poverty trends. The top teams will take home a share of the $10,000 prize, provided by The World Bank. Learn more and submit predictions here: www.drivendata.org/competitions...
Throwback to our "Hateful Memes" challenge where teams detected harmful content combining text and images. Critical work for online safety! 🛡️💻 #ContentModeration #AI drivendata.org/competitions/64/
Hot topic in #DataScience: Multi-modal learning is bridging text, images, and audio. Our blog explores practical applications beyond the hype! 🎭🔗 #MultiModal #AI drivendata.co/blog.html
Our latest blog dives into "Causal Inference for Data Scientists" - moving beyond correlation to understand what actually drives outcomes! 🔗💡 #CausalInference #DataScience drivendata.co/blog.html
Ever wonder how AI helps track endangered species? Our "Pri-matrix Factorization" competition used camera trap data to identify primates in the wild! 🐒📷 #WildlifeConservation #ComputerVision drivendata.org/competitions/49/
Community Spotlight: Kirill Brodt
The Community Spotlight features fantastic members from our DrivenData community. Kirill Brodt, a researcher in computer graphics at the University of Montreal, talks animation, pose estimation, and data science challenges.
Want to build ML systems that actually work in production? Download our free ebook "The 10 Rules of Reliable Data Science" and learn the essentials from our years of real-world experience. Game-changing insights await! 📊🔬 #DataScience #MachineLearning
Check out our "Open AI Caribbean Challenge" winners! Teams used satellite imagery to map building footprints for disaster preparedness. Amazing work applying #ComputerVision to humanitarian needs! 🛰️🏠 #DataForGood drivendata.org/competitions/58/
📢 We're Hiring: Child Speech Transcriber (Remote, Part-Time)
Join DrivenData in making speech technology more accessible for children! We're looking for a Child Speech Orthographic Transcriber to help advance Automatic Speech Recognition (ASR) for young learners.
docs.google.com/forms/d/e/1F...