Stefan Lattner (@stefanlattner) Bsky

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no addition...

🎶 New paper out!
Diffusion Timbre Transfer via Mutual Information Guided Inpainting

Training-free timbre transfer with diffusion models: preserve melody & rhythm, edit timbre at inference time using MI-guided noise and clamping.

📄 arxiv.org/abs/2601.01294

#DiffusionModels #AudioML #GenAI #MIR

2 months ago 1 0 0 0

🎉 New ISMIR 2025 paper!

Autoregressive Diffusion Models estimate musical surprisal more effectively than GIVT — capturing pitch expectations & segment boundaries 🎶

📜 arxiv.org/abs/2508.05306

#ListenerModels #Diffusion #ISMIR2025 @sonycsl-paris.bsky.social

8 months ago 4 1 0 0

Assessing the Alignment of Audio Representations with Timbre Similarity Ratings Psychoacoustical so-called "timbre spaces" map perceptual similarity ratings of instrument sounds onto low-dimensional embeddings via multidimensional scaling, but suffer from scalability issues and a...

🎶 New paper alert!
Do AI audio embeddings *hear* timbre like we do?
➡️ Benchmarked 18 reps vs 2.6 K human ratings (21 datasets)
🏅 Style embeddings from CLAP & our sound-matching model are best aligned!
Paper: arxiv.org/abs/2507.07764
#ISMIR2025 #MIR #AudioAI #SonyCSLMusic

9 months ago 3 0 1 1

DrumGAN DrumGAN is able to generate audio content from scratch, or make variations of a user’s content.

As Sony Techhub went offline, here is the direct link to DrumGAN:

drumgan.csl.sony.fr

11 months ago 1 0 0 0

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters
Tuesday, April 8 ( pm): Music analysis I

1 year ago 0 0 0 0

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis
Tuesday, April 8 ( pm): Music analysis I

1 year ago 0 0 1 0

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal
Friday, April 11 ( am): Applied Signal Processing Systems

Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer
Wednesday, April 9 ( am): Music analysis II

1 year ago 0 0 1 0

🔥Visit our talks and posters at #ICASSP2025! 👀

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas
Wednesday, April 9 ( pm): Deep generative models I

1 year ago 2 0 1 0

🤩 From our series "@ieeeICASSP paper released", we announce that "Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures" is online!

📜 Paper: arxiv.org/pdf/2411.19806

Thx to my colleagues Alain Riou, Geoffroy Peeters, Gaetan Hadjeres and Antonin Gagneré!

🎶 SonyCSLMusic 🎶

1 year ago 0 0 0 0

Our #ICASSP paper "Hybrid Losses for Hierarchical Embedding Learning" by Haokun Tian et al. is now online! 💫

We assess the organization of a hierarchical embedding space using different (combinations of) losses and improve on the SOTA.

📜 Paper: arxiv.org/pdf/2501.12796

#SonyCSLParis

1 year ago 2 0 0 0

Recently, I had the honour of giving a keynote speech on Audio Representation Learning and Generation at the DMRN+ workshop at @c4dm at Queen Mary University. 💫

🎬🎙️ Recording:
echo360.org.uk/media/f037dc...

🎶 More Info:
www.qmul.ac.uk/dmrn/dmrn19/

1 year ago 3 0 0 0

We also show that our IC estimates can help predict EEG measurements. 💆‍♀️

Surprisal can be used for segment boundary detection and to simulate the information processing of a listener. 🎶 🧠

📜 Link to the paper: arxiv.org/pdf/2501.07474

Model weights are soon to come! 🏋️

💫✨ #SonyCSLMusic 💫✨

1 year ago 0 0 0 0

Our #ICASSP paper "Estimating Musical Surprisal in Audio" is now online. 😯 <- surprised 😁

Great work by Mathias Bjare and Giorgia Cantisani! 👏

We use an autoregressive transformer and Gaussian mixture models to estimate the information content in music2latent representations. 🧵👇

1 year ago 1 0 1 0

3/ Results show:

- Higher fidelity (FAD ↓ by 20%)
- Better adherence to text & audio prompts (APA ↑)
- Faster generation with 5-step inference!

AI-assisted music production. 🎼💡 Let us know your thoughts!

Congrats to the authors Javier Nistal and Marco Pasini!

#AI #MusicGeneration #Transformers

1 year ago 0 0 1 0

Improving Musical Accompaniment Co-creation via Diffusion Transformers

2/ 🎤 What’s new?

- Stereo output with superior fidelity
- Bridging the gap in Text-to-audio CLAP embeddings 📝🎵
- Faster inference using a consistency framework ⚡

Audio examples: sonycslparis.github.io/improved_dar/ 🎶👂

1 year ago 0 0 0 0

1/ Building on Diff-A-Riff, we’ve upgraded to a stereo-capable autoencoder & replaced the U-Net with a Diffusion Transformer (DiT) to improve quality, diversity, and control. 🎧📈 Plus, our model generates high-quality audio with fewer denoising steps. 🚀

1 year ago 0 0 0 0

🎶✨ New Paper Announcement! ✨🎶
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" 🎹🎸—a study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.

📜Read the full paper here: arxiv.org/pdf/2410.23005 🧵👇

1 year ago 7 2 3 0

Deep Learning 101 for Audio-based MIR — Deep Learning 101 for Audio-based MIR

🧑‍🎓 Our #ISMIR Conference Tutorial "Deep Learning 101 for Audio-based MIR" provides a broad introduction to music audio processing, analysis, and generation.

📘 The book and jupyter notebooks:
geoffroypeeters.github.io/deeplearning...

🎥 The recording of the tutorial:
us02web.zoom.us/rec/share/Qz...

1 year ago 6 2 0 0

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis

Congrats to the authors!

1 year ago 1 0 0 0

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters

1 year ago 2 0 1 0

😃 Accepted #ICASSP papers of Sony CSL Music Team:

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal

Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer

1 year ago 8 1 2 0

Posts by Stefan Lattner