If you are at #IJCNN in Rome, see you tomorrow (Thu July 3) at 4:30pm in the Position Papers session, Leonardo da Vinci hall!
Posts by Christos Plachouras
More in the paper and code: github.com/chrispla/syn.... Big thanks to my collaborators @juj-guinot.bsky.social, George Fazekas, Elio Quinton, @emmanouilb.bsky.social, and Johan Pauwels!
Iβll be at IJCNN 2025 in Rome in a month to present this - see you there!
We argue that downstream task evaluation cannot easily uncover these behaviors, and that equivariance, invariance, and disentanglement are critical components that enable a variety of real-world applications like retrieval, generation, and style transfer.
Additionally, model and data scale has different effects for different models on the equivariance, invariance, and disentanglement of concepts in their representations!
Importantly, different pretraining paradigms and architectures exhibit different behavior in terms of these aspects, suggesting the underlying mechanisms for their downstream performance are different.
However, we show that representations differ in other important aspects, such as their equivariance (how predictably they change under input transformations), invariance (how stable they remain under them), and disentanglement (how separated different concepts are within them).
Thereβs widespread suspicion that as model and data scale increases, model representations converge (e.g., the platonic representation hypothesis (arxiv.org/abs/2405.07987), in part supported by different architectures and pretraining paradigms leading to similar downstream performance.
Excited to share our new paper: arxiv.org/abs/2505.06224! We introduce a unified framework for evaluating model representations beyond downstream tasks, and use it to uncover some interesting insights about the structure of representations that challenge conventional wisdom ππ§΅
more at the audio coding and modeling poster sessions (AASP in poster area 2B) tomorrow (Friday) 8:30am to 10am!
paper link: tinyurl.com/4seckd2c
finally, we found that regardless of pretraining data scale, the inherent robustness of representations to relevant data perturbations is still concerning
(MusiCNN π , VGG π΅, AST π’, CLMR π΄, TMAE π£, MFCC π©Ά)
we also saw that handcrafted features are actually still outperforming learned representations in some tasks
(MFCCs π©Ά, Chroma π€)
this could be somewhat attributable to larger-capacity downstream models (2-layer MLP) being able to recover more relevant information from a βbadβ representation, hinting that more data could be contributing to relevant information becoming more accessible
(MusiCNN π , VGG π΅)
we found that representations from some non-trained models perform surprisingly well in tagging, and that the tagging performance gap for different pretraining scales isnβt very large
(MusiCNN π , VGG π΅, AST π’, CLMR π΄, TMAE π£, MFCC π©Ά)
we created subsets of MTAT ranging from 5 minutes to 8,000 minutes of audio and pretrained MusiCNN, a VGG, AST, CLMR, and a transformer-based MAE. we then extracted representations and trained SLPs and MLPs on music tagging, instrument recognition, and key detection
how much music data do we actually need to pretrain effective music representation learning models? we decided to systematically investigate this in our latest #ICASSP2025 paper with @emmanouilb.bsky.social and Johan Pauwels
In the context of the UPF-BMAT Chair on AI and Music, we offer 2 PhD positions to cover topics related to the development of AI models to support the music sector, emphasizing ethical approaches.
πΉ Application deadline: April 21st 2025
πΉ Starting date: October 2025
www.upf.edu/web/mtg/home...
obsessed with this
www.youtube.com/watch?v=FZeE...
πΆ 5 years ago...
We've created our first Starter Pack, collecting accounts from the brilliant researchers and staff who are working at the School of Electronic Engineering and Computer Science.
We'll keep adding to this list as our Bluesky community grows: go.bsky.app/45VsoiD
#EduSky #AI #ComputerSkyence
#ismir2025 already has an website! isn't it crazy?
ismir2025.ismir.net
We're here too now! π₯³