Advertisement Β· 728 Γ— 90

Posts by Stefan Lattner

Preview
Diffusion Timbre Transfer Via Mutual Information Guided Inpainting We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no addition...

🎢 New paper out!
Diffusion Timbre Transfer via Mutual Information Guided Inpainting

Training-free timbre transfer with diffusion models: preserve melody & rhythm, edit timbre at inference time using MI-guided noise and clamping.

πŸ“„ arxiv.org/abs/2601.01294

#DiffusionModels #AudioML #GenAI #MIR

2 months ago 1 0 0 0
Post image

πŸŽ‰ New ISMIR 2025 paper!

Autoregressive Diffusion Models estimate musical surprisal more effectively than GIVT β€” capturing pitch expectations & segment boundaries 🎢

πŸ“œ arxiv.org/abs/2508.05306

#ListenerModels #Diffusion #ISMIR2025 @sonycsl-paris.bsky.social

8 months ago 4 1 0 0
Preview
Assessing the Alignment of Audio Representations with Timbre Similarity Ratings Psychoacoustical so-called "timbre spaces" map perceptual similarity ratings of instrument sounds onto low-dimensional embeddings via multidimensional scaling, but suffer from scalability issues and a...

🎢 New paper alert!
Do AI audio embeddings *hear* timbre like we do?
➑️ Benchmarked 18 reps vs 2.6 K human ratings (21 datasets)
πŸ… Style embeddings from CLAP & our sound-matching model are best aligned!
Paper: arxiv.org/abs/2507.07764
#ISMIR2025 #MIR #AudioAI #SonyCSLMusic

9 months ago 3 0 1 1
DrumGAN DrumGAN is able to generate audio content from scratch, or make variations of a user’s content.

As Sony Techhub went offline, here is the direct link to DrumGAN:

drumgan.csl.sony.fr

11 months ago 1 0 0 0

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. GagnerΓ©, G. Hadjeres, S. Lattner, G. Peeters
Tuesday, April 8 ( pm): Music analysis I

1 year ago 0 0 0 0

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis
Tuesday, April 8 ( pm): Music analysis I

1 year ago 0 0 1 0

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal
Friday, April 11 ( am): Applied Signal Processing Systems

Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer
Wednesday, April 9 ( am): Music analysis II

1 year ago 0 0 1 0

πŸ”₯Visit our talks and posters at #ICASSP2025! πŸ‘€

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas
Wednesday, April 9 ( pm): Deep generative models I

1 year ago 2 0 1 0
Advertisement

🀩 From our series "@ieeeICASSP paper released", we announce that "Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures" is online!

πŸ“œ Paper: arxiv.org/pdf/2411.19806

Thx to my colleagues Alain Riou, Geoffroy Peeters, Gaetan Hadjeres and Antonin GagnerΓ©!

🎢 SonyCSLMusic 🎢

1 year ago 0 0 0 0
Post image

Our #ICASSP paper "Hybrid Losses for Hierarchical Embedding Learning" by Haokun Tian et al. is now online! πŸ’«

We assess the organization of a hierarchical embedding space using different (combinations of) losses and improve on the SOTA.

πŸ“œ Paper: arxiv.org/pdf/2501.12796

#SonyCSLParis

1 year ago 2 0 0 0
Post image

Recently, I had the honour of giving a keynote speech on Audio Representation Learning and Generation at the DMRN+ workshop at @c4dm at Queen Mary University. πŸ’«

πŸŽ¬πŸŽ™οΈ Recording:
echo360.org.uk/media/f037dc...

🎢 More Info:
www.qmul.ac.uk/dmrn/dmrn19/

1 year ago 3 0 0 0

We also show that our IC estimates can help predict EEG measurements. πŸ’†β€β™€οΈ

Surprisal can be used for segment boundary detection and to simulate the information processing of a listener. 🎢 🧠

πŸ“œ Link to the paper: arxiv.org/pdf/2501.07474

Model weights are soon to come! πŸ‹οΈ

πŸ’«βœ¨ #SonyCSLMusic πŸ’«βœ¨

1 year ago 0 0 0 0
Post image

Our #ICASSP paper "Estimating Musical Surprisal in Audio" is now online. 😯 <- surprised 😁

Great work by Mathias Bjare and Giorgia Cantisani! πŸ‘

We use an autoregressive transformer and Gaussian mixture models to estimate the information content in music2latent representations. πŸ§΅πŸ‘‡

1 year ago 1 0 1 0

3/ Results show:

- Higher fidelity (FAD ↓ by 20%)
- Better adherence to text & audio prompts (APA ↑)
- Faster generation with 5-step inference!

AI-assisted music production. πŸŽΌπŸ’‘ Let us know your thoughts!

Congrats to the authors Javier Nistal and Marco Pasini!

#AI #MusicGeneration #Transformers

1 year ago 0 0 1 0
Improving Musical Accompaniment Co-creation via Diffusion Transformers

2/ 🎀 What’s new?

- Stereo output with superior fidelity
- Bridging the gap in Text-to-audio CLAP embeddings πŸ“πŸŽ΅
- Faster inference using a consistency framework ⚑

Audio examples: sonycslparis.github.io/improved_dar/ πŸŽΆπŸ‘‚

1 year ago 0 0 0 0

1/ Building on Diff-A-Riff, we’ve upgraded to a stereo-capable autoencoder & replaced the U-Net with a Diffusion Transformer (DiT) to improve quality, diversity, and control. πŸŽ§πŸ“ˆ Plus, our model generates high-quality audio with fewer denoising steps. πŸš€

1 year ago 0 0 0 0

🎢✨ New Paper Announcement! ✨🎢
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" πŸŽΉπŸŽΈβ€”a study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.

πŸ“œRead the full paper here: arxiv.org/pdf/2410.23005 πŸ§΅πŸ‘‡

1 year ago 7 2 3 0
Deep Learning 101 for Audio-based MIR β€” Deep Learning 101 for Audio-based MIR

πŸ§‘β€πŸŽ“ Our #ISMIR Conference Tutorial "Deep Learning 101 for Audio-based MIR" provides a broad introduction to music audio processing, analysis, and generation.

πŸ“˜ The book and jupyter notebooks:
geoffroypeeters.github.io/deeplearning...

πŸŽ₯ The recording of the tutorial:
us02web.zoom.us/rec/share/Qz...

1 year ago 6 2 0 0
Advertisement

Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis

Congrats to the authors!

1 year ago 1 0 0 0

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. GagnerΓ©, G. Hadjeres, S. Lattner, G. Peeters

1 year ago 2 0 1 0

πŸ˜ƒ Accepted #ICASSP papers of Sony CSL Music Team:

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal

Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer

1 year ago 8 1 2 0