Understanding, not correction.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
That result matters not because fine-tuning is surprising — it isn't — but because of what it proves. The speech was always intelligible. The model just hadn't learned how to listen to it yet. All I did was teach it.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Same architecture, different training distribution. One run, a few hours later: 12.1% word error rate. A 66% improvement.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
It's undertrained on this kind of speech because this kind of speech is underrepresented in every dataset that ever went into it. That's not a model failure — it's a data failure upstream of the model.
So I fine-tuned.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Averaging in easier cases flatters the metric and hides the real gap.
The real gap was the point. Whisper isn't bad because it was built carelessly.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
I added a clarifying commit almost immediately, because if you're building something for a specific population, your baseline has to be honest about that population.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Then I read my own measurement more carefully. That number included non-DS speakers in the mix. Strip those out and look at DS speech alone, and the picture gets worse. The headline was misleading.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
So I ran vanilla Whisper — one of the best general-purpose speech recognition models in the world — against a curated dataset of Down syndrome speech. The word error rate came back at 35.7%.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
66% improvement in one training run — and why the baseline number was a lie
Before you can make something better, you need to know how bad it actually is.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility