The Hidden Audio Bias Inside Audio-Visual Speech Recognition
Shapley analysis reveals why AVSR models keep trusting corrupted audio, exposing a hidden bias in multimodal speech recognition.
Telegram AI Digest
#ai #news #speechrecognition
Скрытое звуковое предубеждение в распознавании аудиовизуальной речи
Анализ Шепли показывает, почему модели AVSR продолжают доверять искаженному аудио, выявляя скрытую предвзятость в мультимодальном распознавании речи.
Telegram ИИ Дайджест
#ai #news #speechrecognition
Understanding, not correction.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
That result matters not because fine-tuning is surprising — it isn't — but because of what it proves. The speech was always intelligible. The model just hadn't learned how to listen to it yet. All I did was teach it.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Same architecture, different training distribution. One run, a few hours later: 12.1% word error rate. A 66% improvement.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
It's undertrained on this kind of speech because this kind of speech is underrepresented in every dataset that ever went into it. That's not a model failure — it's a data failure upstream of the model.
So I fine-tuned.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Averaging in easier cases flatters the metric and hides the real gap.
The real gap was the point. Whisper isn't bad because it was built carelessly.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
I added a clarifying commit almost immediately, because if you're building something for a specific population, your baseline has to be honest about that population.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Then I read my own measurement more carefully. That number included non-DS speakers in the mix. Strip those out and look at DS speech alone, and the picture gets worse. The headline was misleading.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
So I ran vanilla Whisper — one of the best general-purpose speech recognition models in the world — against a curated dataset of Down syndrome speech. The word error rate came back at 35.7%.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
66% improvement in one training run — and why the baseline number was a lie
Before you can make something better, you need to know how bad it actually is.
#speechrecognition #fine-tuning #Downsyndrome #worderrorrate #accessibility
Cohere Transcribe: Speech Recognition
Telegram AI Digest
#ai #cohere #speechrecognition
Cohere Transcribe: Распознавание речи
Telegram ИИ Дайджест
#ai #cohere #speechrecognition
Speech Recognition Market is Transforming the Industry Landscape www.marketresearchfuture.com/reports/spee...
#SpeechRecognition #AI #VoiceTech #NaturalLanguageProcessing #SmartAssistants #Automation #Innovation #TechMarket
Mistral Ships Voxtral - Open-Weights Voice AI Platform
awesomeagents.ai/news/mistral-voxtral-ope...
#Mistral #OpenSource #SpeechRecognition
Simplified, intuitive, and packed with new features, our brand new mobile app for Apple and Android is coming soon.
Intelligent workflows, ambient voice technology, medical speech recognition, and text messaging on the go.
#workflows #ambientAI #speechrecognition #digitaltransformation #medtech
Mistral Ships Voxtral - Open-Weights Voice AI Platform
awesomeagents.ai/news/mistral-voxtral-ope...
#Mistral #OpenSource #SpeechRecognition
winbuzzer.com/2026/03/27/c...
Cohere's Open-Source Transcribe Model Tops ASR Leaderboard
#AI #Cohere #CohereTranscribe #SpeechRecognition #AITranscription #OpenSourceAI #HuggingFace #MultimodalAI
Cohere's lean transcription AI challenges the proprietary model
#AI #OpenSource #SpeechRecognition #AusNews
thedailyperspective.org/article/2026-03-27-coher...
One powerful solution for AI scribing, speech recognition and text/email messaging.
Visit our website and find out why NHS organisations have been choosing Lexacom for more than 25 years. Link in bio.
#nhs #digitaltransformation #speechrecognition #workflows #healthtech
The Best Medical Speech Recognition Software and APIs in 2026
Medical speech recognition tools are transforming healthcare by reducing documentation time, improving workflow efficiency, and lowering physician burnout. This guide compares top APIs an…
Telegram AI Digest
#ai #news #speechrecognition
Лучшее программное обеспечение и API для распознавания медицинской речи в 2026 году
Инструменты распознавания медицинской речи преобразуют здравоохранение, сокращая время документирования, повышая эффективность рабочего процесса и снижая выгорание…
Telegram ИИ Дайджест
#ai #news #speechrecognition
Lexacom has been supporting healthcare professionals across the NHS for more than 25 years.
- Intelligent workflows
- Ambient voice technology
- Medical speech recognition
- Digital dictation
- Text messaging
#nhs #medtech #workflows #nhsdigital #speechrecognition #nhsAI #AIinhealthcare
Scottish GP with dyslexia hails the benefits of Lexacom in helping his practice cope with a rise in administrative demands and growing clinical complexity.
➡️ Read the full case study on our website > Resources. Link in bio.
#nhs #speechrecognition #nhsAI #digitaltransformation
winbuzzer.com/2026/03/16/i...
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
#AI #AIModels #IBM #SpeechRecognition #OpenSourceAI #EnterpriseAI #EdgeComputing #AITranslation #OpenASRLeaderboard
#Human #brain and #AI #speechrecognition decode #speech in similar step-by-step stages, study finds
techxplore.com/news/2026-03...
Speech Recognition Market Share, Growth Drivers, and Competitive Analysis www.marketresearchfuture.com/reports/spee...
#SpeechRecognition #VoiceAI #VoiceTechnology
IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard
awesomeagents.ai/news/ibm-granite-4-speec...
#Ibm #Granite #SpeechRecognition
Simplified, intuitive, and packed with new features, our brand new mobile app for Apple and Android is coming soon.
Intelligent workflows, ambient voice technology, medical speech recognition, and text messaging on the go.
#workflows #ambientAI #speechrecognition #digitaltransformation #medtech
IBM taps Deepgram to add real-time speech to watsonx Orchestrate #Technology #SoftwareandApps #Other #IBM #WatsonX #SpeechRecognition
siliconangle.com/2026/02/24/ibm-taps-deep...