Advertisement · 728 × 90

Posts by Ben Hayes

Post image

🔊 Follow the links above for audio examples, full training code, and the arXiv pre-print.

10 months ago 0 0 0 0
Post image

🏆 We then apply this method to a dataset of sounds sampled from Surge XT — a feature rich software synthesizer — and find that it dramatically outperforms state-of-the-art baselines on audio reconstruction.

10 months ago 1 0 1 0
Post image

🤔 However, in the case of real synthesizers, we may not know the appropriate symmetries a priori. To allow them to be discovered adaptively, we introduce a technique called Param2Tok, which learns a mapping from synthesizer parameters to model tokens.

10 months ago 0 0 1 0

🗺️ We can further improve performance by designing a model with equivariance to the appropriate symmetry.

10 months ago 0 0 1 0
Post image

📈 We design a toy task that isolates this phenomenon and find that the presence of permutation symmetry degrades the performance of conventional methods. We then show that a generative approach, which can assign predictive weight to multiple possible solutions, performs considerably better.

10 months ago 0 0 1 0

‼️ In this work, we argue that the problem is ill-posed: there are multiple sets of parameters that produce any given sound. Further, we show that many of these equivalent solutions are due to intrinsic symmetries of the synthesizer!

10 months ago 0 0 1 0

🧑‍🔬 Previous approaches have struggled to scale to the full complexity of synthesizers used in modern audio production. Why?

10 months ago 0 0 1 0
Advertisement

🎛️ Programming synthesizers is a fiddly business, and so a line of work known as "sound matching" has, over the last few decades, sought to answer the question: given an audio signal and a synthesizer, which configuration of parameters best approximates the signal?

10 months ago 0 0 1 0

🎹 Audio synthesizers are diverse and complex beasts, combining a variety of techniques to produce sounds ranging from familiar to entirely alien.

10 months ago 0 0 1 0

TL;DR: Predicting synthesizer parameters from audio is hard because multiple parameter configurations can produce the same sound. We design a model that accounts for this and find that it dramatically outperforms previous approaches, and works on production grade, feature rich VST synthesizers.

10 months ago 0 0 1 0
Post image

Very excited to share that our latest work, "Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching", has been accepted to ISMIR 2025 in Daejon, Korea!

Paper: arxiv.org/abs/2506.07199
Audio: benhayes.net/synth-perm/
Code: github.com/ben-hayes/sy...

🧵

10 months ago 12 0 1 0

going to Korea, baby! 🇰🇷 #ISMIR2025

10 months ago 4 0 0 1
DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji

DiffVox integrates differentiable vocal effects; analysis reveals parameter correlations and connections to McAdams' timbre dimensions; parameter distributions non-Gaussian; code and datasets available.

11 months ago 3 1 0 0
Preview
Generative modelling in latent space Latent representations for generative models.

wake up, babe. new @sedielem.bsky.social just dropped

sander.ai/2025/04/15/l...

1 year ago 3 0 0 0

amazing how the soothing beep of stolen Lime bikes has so naturally woven itself into the London soundscape

1 year ago 2 0 0 0
Advertisement
Preview
hard drive clear out 2016-2020, by Ben Hayes 21 track album

turned on an old computer and found some old unfinished music gathering dust. uploading it so it at least lives somewhere.

1 year ago 7 1 0 0

the best ones combine two or more

1 year ago 0 0 0 0

realised tonight there are only 3 red hot chili peppers songs:

1. california
2. zoop di blamp
3. heroin, but it's a woman

1 year ago 1 0 1 0
Designing Neural Synthesizers for Low Latency Interaction Franco Caspe, Jordie Shier, Mark Sandler, Charalampos Saitis, Andrew McPherson

A low-latency neural audio synthesizer (BRAVE) was designed by analyzing latency sources in existing models (RAVE); BRAVE improved pitch and loudness replication while maintaining timbre modification capabilities, implemented in a specialized inference framework.

1 year ago 8 1 0 0

negative \vspace season approaches 😈

1 year ago 6 0 0 0
NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

NablAFx, an open-source PyTorch framework, supports differentiable black-box and gray-box modeling of audio effects; it includes model architectures, datasets, training features, and plotting functions.

1 year ago 1 1 0 0
Deep Learning 101 for Audio-based MIR — Deep Learning 101 for Audio-based MIR

Two excellent recent resources:

1. (not strictly a paper) This tutorial from the last ISMIR, courtesy of: geoffroypeeters.github.io/deeplearning...
2. This overview of model-based deep learning for MIR: arxiv.org/abs/2406.11540

1 year ago 4 0 0 0
Preview
Equivariant flow matching Normalizing flows are a class of deep generative models that are especially interesting for modeling probability distributions in physics, where the exact likelihood of flows allows reweighting to kno...

I look at it as squeezing a *slightly* better coupling out of the batch.

they do something related here (arxiv.org/abs/2306.15030) with the Kabsch algorithm, but they transform the target samples as they're specifically trying to learn a rotation invariant distribution with an equivariant flow.

1 year ago 1 0 1 0
Advertisement

haven't crunched through it on paper but my hunch is this works because of the spherical symmetry of the Gaussian dist, so any orthogonal transformation of the batch is exactly as probable (should work for any O(d) invariant distribution if true)

1 year ago 1 0 1 0

very anecdotally, I've found that when using a normal source distribution, performing orthogonal Procrustes on the source samples (to match the target samples) after minibatch coupling by exact linear assignment (Hungarian algo), seems to speed up convergence by a noticeable amount.

1 year ago 1 0 1 0

amazing, @drscotthawley.bsky.social ! I've been recommending this post to everyone recently.

1 year ago 2 0 1 0

🎶✨ New Paper Announcement! ✨🎶
We present "Improving Musical Accompaniment Co-creation via Diffusion Transformers" 🎹🎸—a study advancing our Diff-A-Riff stem generator through improved quality, efficiency, and control.

📜Read the full paper here: arxiv.org/pdf/2410.23005 🧵👇

1 year ago 7 2 3 0

This seems to be where ML-facing config libraries (hydra, gin, jsonargparse, etc) converge, and is what I grudgingly end up doing. It makes me wince, though, because it seems to lead invariably to non-trivial and untested instantiation logic being encoded in the relationships between config files.

1 year ago 1 0 1 0

1. this is excellent work
2. your vocal imitations are everything ❤️

1 year ago 2 0 1 0
Post image

speaking at Akademie der Bildenden Künste in Munich on Dec 16th

"Phantasmagoria: Sound Synthesis after the Turing Test"

about the methodological, ethical, and environmental implications of Generative AI for audio

by invitation from Florian Hecker

hal.science/hal-04650754

1 year ago 13 3 0 0
Advertisement