πΆ New paper out!
Diffusion Timbre Transfer via Mutual Information Guided Inpainting
Training-free timbre transfer with diffusion models: preserve melody & rhythm, edit timbre at inference time using MI-guided noise and clamping.
π arxiv.org/abs/2601.01294
#DiffusionModels #AudioML #GenAI #MIR
Posts by Stefan Lattner
π New ISMIR 2025 paper!
Autoregressive Diffusion Models estimate musical surprisal more effectively than GIVT β capturing pitch expectations & segment boundaries πΆ
π arxiv.org/abs/2508.05306
#ListenerModels #Diffusion #ISMIR2025 @sonycsl-paris.bsky.social
πΆ New paper alert!
Do AI audio embeddings *hear* timbre like we do?
β‘οΈ Benchmarked 18 reps vs 2.6 K human ratings (21 datasets)
π
Style embeddings from CLAP & our sound-matching model are best aligned!
Paper: arxiv.org/abs/2507.07764
#ISMIR2025 #MIR #AudioAI #SonyCSLMusic
Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. GagnerΓ©, G. Hadjeres, S. Lattner, G. Peeters
Tuesday, April 8 ( pm): Music analysis I
Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis
Tuesday, April 8 ( pm): Music analysis I
Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal
Friday, April 11 ( am): Applied Signal Processing Systems
Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer
Wednesday, April 9 ( am): Music analysis II
π₯Visit our talks and posters at #ICASSP2025! π
Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas
Wednesday, April 9 ( pm): Deep generative models I
π€© From our series "@ieeeICASSP paper released", we announce that "Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures" is online!
π Paper: arxiv.org/pdf/2411.19806
Thx to my colleagues Alain Riou, Geoffroy Peeters, Gaetan Hadjeres and Antonin GagnerΓ©!
πΆ SonyCSLMusic πΆ
Our #ICASSP paper "Hybrid Losses for Hierarchical Embedding Learning" by Haokun Tian et al. is now online! π«
We assess the organization of a hierarchical embedding space using different (combinations of) losses and improve on the SOTA.
π Paper: arxiv.org/pdf/2501.12796
#SonyCSLParis
Recently, I had the honour of giving a keynote speech on Audio Representation Learning and Generation at the DMRN+ workshop at @c4dm at Queen Mary University. π«
π¬ποΈ Recording:
echo360.org.uk/media/f037dc...
πΆ More Info:
www.qmul.ac.uk/dmrn/dmrn19/
We also show that our IC estimates can help predict EEG measurements. πββοΈ
Surprisal can be used for segment boundary detection and to simulate the information processing of a listener. πΆ π§
π Link to the paper: arxiv.org/pdf/2501.07474
Model weights are soon to come! ποΈ
π«β¨ #SonyCSLMusic π«β¨
Our #ICASSP paper "Estimating Musical Surprisal in Audio" is now online. π― <- surprised π
Great work by Mathias Bjare and Giorgia Cantisani! π
We use an autoregressive transformer and Gaussian mixture models to estimate the information content in music2latent representations. π§΅π
3/ Results show:
- Higher fidelity (FAD β by 20%)
- Better adherence to text & audio prompts (APA β)
- Faster generation with 5-step inference!
AI-assisted music production. πΌπ‘ Let us know your thoughts!
Congrats to the authors Javier Nistal and Marco Pasini!
#AI #MusicGeneration #Transformers
2/ π€ Whatβs new?
- Stereo output with superior fidelity
- Bridging the gap in Text-to-audio CLAP embeddings ππ΅
- Faster inference using a consistency framework β‘
Audio examples: sonycslparis.github.io/improved_dar/ πΆπ
1/ Building on Diff-A-Riff, weβve upgraded to a stereo-capable autoencoder & replaced the U-Net with a Diffusion Transformer (DiT) to improve quality, diversity, and control. π§π Plus, our model generates high-quality audio with fewer denoising steps. π
πΆβ¨ New Paper Announcement! β¨πΆ
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" πΉπΈβa study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.
πRead the full paper here: arxiv.org/pdf/2410.23005 π§΅π
π§βπ Our #ISMIR Conference Tutorial "Deep Learning 101 for Audio-based MIR" provides a broad introduction to music audio processing, analysis, and generation.
π The book and jupyter notebooks:
geoffroypeeters.github.io/deeplearning...
π₯ The recording of the tutorial:
us02web.zoom.us/rec/share/Qz...
Hybrid Losses for Hierarchical Embedding Learning
H. Tian, S. Lattner, B. McFee, C. Saitis
Congrats to the authors!
Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
M. Pasini, S. Lattner, G. Fazekas
Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
A. Riou, S. Lattner, A. GagnerΓ©, G. Hadjeres, S. Lattner, G. Peeters
π Accepted #ICASSP papers of Sony CSL Music Team:
Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
M. Grachten, J. Nistal
Estimating Musical Surprisal in Audio
M. Bjare, G. Cantisani, S. Lattner and G. Widmer