Hey,
I added some longer generation examples, by enforcing `min_new_tokens`. Definitely can lose itself a bit more but still pretty decent I think :)
Check it out:
pages.cs.huji.ac.il/adiyoss-lab/...
And feel free to generate anything with a single line of code:
github.com/slp-rl/slamkit
Posts by Gallil Maimon
@gallilmaimon.bsky.social and his team trained a Speech Language Models on 1xA5000 GPU in 24 hours
I love papers that make ML training accessible with consumer GPUs. Great example: "Slamming: Training a Speech Language Model on One GPU in a Day" released 3 days ago. The full code and training data are available and reproducible using a 24GB RTX 3090.
- arxiv.org/abs/2502.15814
We generated samples with a max length, but the model can predict an "end" token before. One could play with sampling params to make the model keep talking:)
I will try get time to generate longer samples, but also encourage everyone to play around themselves. We tried to make it relatively easy🙏
And about this - yes!
We are accepting PRs to add more tokenisers, better optimisers, efficient attention implementations and anything that seems relevant :)
Feel free to reach out 💪
Hey!
Really pleased you liked our work:) I think with the help of the open source community we can push results even further.
About generation length - the model context is 1024~=40 seconds of audio, but we used a setup like TWIST for evaluation. Definitely worth testing longer generations!
🔜🗣️It was shown to be really useful for training SpeechLMs. We are working on some stuff now to hopefully make it even easier. More to come soon!💪
🚨Attention #speech @hf.co people🤗💬
We added official support for mhubert-25hz from TWIST in transformers. We also converted it from fairseq to HF!
Check it out✌️
huggingface.co/slprl/mhuber...
I am thrilled to share that SALMon🍣 got accepted to #ICASSP25
For code, data, preprint and live leaderboard checkout - pages.cs.huji.ac.il/adiyoss-lab/...
w/ Amit Roth and Yossi Adi
For instance, in my opinion, in this example it feels unlikely that people would use stress to convey these meanings. Happy for all and any suggestions and insights :)
#Speech people: I am looking for examples (or resources) where stress or emphasis on a phrase changes the meaning of a sentence. This part of a study on intonation in SpeechLMs.
I gave a decent ChatGPT answer below, but many weren't great...
🥇Project page (+leaderboard) - pages.cs.huji.ac.il/adiyoss-lab/...
📜Paper - arxiv.org/abs/2409.07437
💻Code - github.com/slp-rl/salmon
🤗 Data - huggingface.co/datasets/slp...
🪙 I assume sentiment improved because of style tokens (also shown in STSP metric from SpiritLM). I wonder what is limiting performance - data? modelling? tokens? We welcome suggestions and new SLMs!
We added SpiritLM to the SALMon🍣 leaderboard! Nice jump in emotion consistency, but still no improvement in jointly modelling text content and acoustics🥲
Think your SLM can do better?💪
links👇
I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA
(Self-)nominations welcome!
Great list! I’d be happy to join as well :)