Advertisement · 728 × 90

Posts by Gallil Maimon

Preview
Slamming: Training a Speech Language Model on One GPU in a Day Slam is a training recipe for training high-quality SLMs on 1 gpu in 24 hours.

Hey,

I added some longer generation examples, by enforcing `min_new_tokens`. Definitely can lose itself a bit more but still pretty decent I think :)

Check it out:
pages.cs.huji.ac.il/adiyoss-lab/...

And feel free to generate anything with a single line of code:
github.com/slp-rl/slamkit

1 year ago 1 0 1 0
Post image

@gallilmaimon.bsky.social and his team trained a Speech Language Models on 1xA5000 GPU in 24 hours

1 year ago 3 1 1 0
Preview
Slamming: Training a Speech Language Model on One GPU in a Day We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, ...

I love papers that make ML training accessible with consumer GPUs. Great example: "Slamming: Training a Speech Language Model on One GPU in a Day" released 3 days ago. The full code and training data are available and reproducible using a 24GB RTX 3090.

- arxiv.org/abs/2502.15814

1 year ago 12 1 2 0

We generated samples with a max length, but the model can predict an "end" token before. One could play with sampling params to make the model keep talking:)

I will try get time to generate longer samples, but also encourage everyone to play around themselves. We tried to make it relatively easy🙏

1 year ago 2 0 0 0

And about this - yes!
We are accepting PRs to add more tokenisers, better optimisers, efficient attention implementations and anything that seems relevant :)

Feel free to reach out 💪

1 year ago 1 1 0 0

Hey!
Really pleased you liked our work:) I think with the help of the open source community we can push results even further.

About generation length - the model context is 1024~=40 seconds of audio, but we used a setup like TWIST for evaluation. Definitely worth testing longer generations!

1 year ago 2 0 1 0

🔜🗣️It was shown to be really useful for training SpeechLMs. We are working on some stuff now to hopefully make it even easier. More to come soon!💪

1 year ago 0 0 0 0
Preview
slprl/mhubert-base-25hz · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

🚨Attention #speech @hf.co people🤗💬
We added official support for mhubert-25hz from TWIST in transformers. We also converted it from fairseq to HF!

Check it out✌️
huggingface.co/slprl/mhuber...

1 year ago 0 0 1 0

I am thrilled to share that SALMon🍣 got accepted to #ICASSP25

For code, data, preprint and live leaderboard checkout - pages.cs.huji.ac.il/adiyoss-lab/...

w/ Amit Roth and Yossi Adi

1 year ago 1 0 0 0
Advertisement
Post image

For instance, in my opinion, in this example it feels unlikely that people would use stress to convey these meanings. Happy for all and any suggestions and insights :)

1 year ago 0 0 0 0
Post image

#Speech people: I am looking for examples (or resources) where stress or emphasis on a phrase changes the meaning of a sentence. This part of a study on intonation in SpeechLMs.

I gave a decent ChatGPT answer below, but many weren't great...

1 year ago 1 0 1 0
Preview
SALMon: Suite for Acoustic Language Model evaluation SALMon is a suite of benchmarks for evaluating Speech Language Models' ability to model acoustics.

🥇Project page (+leaderboard) - pages.cs.huji.ac.il/adiyoss-lab/...
📜Paper - arxiv.org/abs/2409.07437
💻Code - github.com/slp-rl/salmon
🤗 Data - huggingface.co/datasets/slp...

1 year ago 1 0 0 0
Post image

🪙 I assume sentiment improved because of style tokens (also shown in STSP metric from SpiritLM). I wonder what is limiting performance - data? modelling? tokens? We welcome suggestions and new SLMs!

1 year ago 0 0 1 0
Post image

We added SpiritLM to the SALMon🍣 leaderboard! Nice jump in emotion consistency, but still no improvement in jointly modelling text content and acoustics🥲
Think your SLM can do better?💪
links👇

1 year ago 2 0 1 1

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

1 year ago 82 34 44 3

Great list! I’d be happy to join as well :)

1 year ago 1 0 0 0
Advertisement