Advertisement Β· 728 Γ— 90

Posts by Shikhar

Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, Keer Xu, ...
PRiSM: Benchmarking Phone Realization in Speech Models
https://arxiv.org/abs/2601.14046

3 months ago 0 1 0 0

Bharadwaj, Li, Kim, Choi, Yeo, Shim, Zhou, Boldt, Jacome, Chang, Agrawal, Xu, Yang, Zhu, Watanabe, Mortensen: PRiSM: Benchmarking Phone Realization in Speech Models https://arxiv.org/abs/2601.14046 https://arxiv.org/pdf/2601.14046 https://arxiv.org/html/2601.14046

3 months ago 0 2 0 0
Post image

Can we make discrete speech units lightweightπŸͺΆ and streamable🏎? Excited to share our new #Interspeech2025 paper: On-device Streaming Discrete Speech Units arxiv.org/abs/2506.01845 (1/n)

8 months ago 1 1 2 0

Meows, music, murmurs and more - we trained a general purpose audio encoder and open sourced the code, checkpoint and evaluation toolkit.

8 months ago 3 0 0 0
Post image

πŸ“’ We've open-sourced NatureLM-audio, the first audio-language foundation model for #bioacoustics.

Trained on large-scale animal vocalization, human speech & music datasets, the model enables zero-shot classification, detection & querying across diverse species & environments πŸ‘‡πŸ½

11 months ago 27 12 2 0

πŸ”— Resources for ESPnet-SDS:
πŸ“‚ Codebase (part of ESPnet): github.com/espnet/espnet
πŸ“– README & User Guide: github.com/espnet/espne...
πŸŽ₯ Demo Video: www.youtube.com/watch?v=kI_D...

1 year ago 1 1 0 0
Post image

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
πŸ“œ: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...

1 year ago 7 5 1 0
Post image

πŸš€ New #ICLR2025 Paper Alert! πŸš€

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? πŸ—£οΈπŸ”Š

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. πŸ§΅πŸ‘‡

πŸ“œ: arxiv.org/abs/2503.01174

1 year ago 9 6 1 0

Wait I thought the rock was named Dwayne Johnson

1 year ago 0 0 0 0

gpu poverty is real

1 year ago 2 0 1 0
Advertisement
Post image

Happy New Year

1 year ago 23796 4468 386 313

Philip Whittington, Gregor Bachmann, Tiago Pimentel
Tokenisation is NP-Complete
https://arxiv.org/abs/2412.15210

1 year ago 2 1 0 0
Post image

Today, we’re introducing NatureLM-audio: the first large audio-language model tailored for understanding animal sounds. arxiv.org/abs/2411.07186 πŸ§΅πŸ‘‡

1 year ago 15 8 2 4
Post image

Announcing πŸ₯‚ FineWeb2: A sparkling update with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other datasets.

1 year ago 76 19 1 0
Preview
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modaliti...

Language bind arxiv.org/abs/2310.01852
Language as the pivoting modality instead of images. Different training dataset.

1 year ago 2 0 1 0

WAVLab is up in bsky!

1 year ago 8 2 0 0
Post image Post image Post image Post image

We are excited to announce the launch of ML SUPERB 2.0 (multilingual.superbbenchmark.org) as part of the Interspeech 2024 official challenge! We hope this upgraded version of ML SUPERB advances universal access to speech processing worldwide. Please join it!

#Interspeech2025

1 year ago 20 9 1 1

πŸ™‹β€β™‚οΈ

1 year ago 0 0 0 0

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

1 year ago 82 34 44 3
Examples from dataset, a world map surrounded by spectrograms showing animal sounds from different regions of the world

Examples from dataset, a world map surrounded by spectrograms showing animal sounds from different regions of the world

Scatter plot where points are sound data sets, x axis is number of categories in dataset and y axis is duration of dataset in hours

iNatSounds is shown as the largest dataset on both axes

Scatter plot where points are sound data sets, x axis is number of categories in dataset and y axis is duration of dataset in hours iNatSounds is shown as the largest dataset on both axes

iNatSounds: new dataset from folks @inaturalist.bsky.social & co-authors; looks to be one of the largest public datasets of animal sounds

openreview.net/forum?id=QCY...

github.com/visipedia/in...

#prattle πŸ’¬
#bioacoustics

1 year ago 30 14 1 5
Advertisement

πŸ™‹β€β™‚οΈπŸ™

1 year ago 1 0 0 0

πŸ™‹β€β™‚οΈπŸ™

1 year ago 0 0 0 0

πŸ™‹β€β™‚οΈ

1 year ago 0 0 0 0

We're here too now! πŸ₯³

1 year ago 8 6 0 0

Me (shikharb@bsky.social) and our lab bsky.app/profile/wavl...

1 year ago 1 0 0 0