A. Sophia Koepke (@askoepke) Bsky

Many thanks to @phillipisola.bsky.social for the thought-provoking hypothesis, and for discussing and engaging openly with our disagreements -- a rare kind of intellectual generosity.

9/9

4 days ago 4 0 0 0

with Daniil Zverev, @shiryginosar.bsky.social, and Alexei A. Efros.

Berkeley AI Research, @munichcenterml.bsky.social, @tuebingen-ai.bsky.social, @tticconnect.bsky.social

8/9

4 days ago 2 0 1 0

Jakob Johann von Uexküll - Wikipedia

Takeaway: Models, like organisms, may perceive the world within their Umwelt (check von Uexküll: en.wikipedia.org/wiki/Jakob_J...). We suspect future evidence will favor von Uexküll over Plato. Different models may learn rich representations of the world, just not the same one.

7/9

4 days ago 4 0 1 0

Observation 3: Real data is many-to-many (e.g. many images can fit the same caption). When relaxing the 1-to-1 evaluation constraint, alignment decreases further.

6/9

4 days ago 1 0 1 0

Observation 2: Coarse agreement persists but fine-grained agreement does not. In controlled settings, vision and language models reliably retrieve correct-class neighbors but rarely agree on the same instance.

5/9

4 days ago 0 0 1 0

Observation 1: On small datasets, neighbors are sparse, so models 'agree' as there aren't many options. At scale, neighbors get denser and more specialized within each modality (e.g. pose of the car vs. car name).

4/9

4 days ago 3 0 1 0

This is usually tested on ~1K samples. We scaled to 15M samples and found that alignment drops significantly.

3/9

4 days ago 3 0 1 0

Most experimental evidence for convergence comes from checking if an image and its caption embeddings share the same nearest neighbors (aka checking whether they are aligned).

2/9

4 days ago 1 0 1 0

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

New paper: Back into Plato’s Cave

Are vision and language models converging to the same representation of reality? The Platonic Representation Hypothesis says yes. BUT we find the evidence for this is more fragile than it looks.

Project page: akoepke.github.io/cave_umwelten/

1/9

4 days ago 55 15 2 4

Thanks to Daniil Zverev*, @thwiedemer.bsky.social*, @bayesiankitten.bsky.social, Matthias Bethge (@bethgelab.bsky.social), and @wielandbrendel.bsky.social for making VGGSound sounder! 🙌 🎉 🐗

5 months ago 2 0 0 0

VGGSounder: Audio-Visual Evaluations for Foundation Models VGGSounder, a multi-label audio-visual classification dataset with modality annotations.

📊 With VGGSounder, we show that existing models don’t always benefit from multimodal input and sometimes performance even degrades.

Code and data: vggsounder.github.io

5 months ago 2 0 1 0

VGGSounder is a new video classification benchmark for audio-visual foundation models:

We provide:
📢 Re-annotated VGGSound test set
📢 Modality-specific manual labels
📢 A modality confusion metric to diagnose when models misuse modalities

Paper: arxiv.org/pdf/2508.08237

5 months ago 1 0 1 0

🎉 Excited to present our paper VGGSounder: Audio‑Visual Evaluations for Foundation Models today at #ICCV2025!

🕦 Poster Session 1 | 11:30–13:30
📍 Poster #88

Come by if you're into audio-visual learning and want to know whether multiple modalities actually help or hurt.

5 months ago 6 1 1 0

Thanks to @munichcenterml.bsky.social for supporting the workshop with a best paper award (announced at 2.50pm CDT)!

10 months ago 1 0 0 0

We have fantastic speakers, including @saining.bsky.social, @aidanematzadeh.bsky.social, @ranjaykrishna.bsky.social, Ludwig Schmidt, @lisadunlap.bsky.social, and Ishan Misra.

10 months ago 0 0 0 0

EVAL-FoMo 2 - Schedule Date: June 11 (1:00pm - 6:00pm)

Our #CVPR2025 workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) is taking place this afternoon (1-6pm) in room 210.

Workshop schedule: sites.google.com/view/eval-fo...

10 months ago 7 3 3 0

Screenshot of the workshop website "Emergent Visual Abilities and Limits of Foundation Models" at CVPR 2025

Our paper submission deadline for the EVAL-FoMo workshop @cvprconference.bsky.social has been extended to March 19th!
sites.google.com/view/eval-fo...
We welcome submissions (incl. published papers) on the analysis of emerging capabilities / limits in visual foundation models. #CVPR2025

1 year ago 12 5 0 1

Our 2nd Workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) is accepting submissions. We are looking forward to talks by our amazing speakers that include @saining.bsky.social, @aidanematzadeh.bsky.social, @lisadunlap.bsky.social, and @yukimasano.bsky.social. #CVPR2025

1 year ago 7 3 0 1

Upcoming 𝗠𝘂𝗻𝗶𝗰𝗵 𝗔𝗜 𝗟𝗲𝗰𝘁𝘂𝗿𝗲 featuring Prof. Franca Hoffmann from California Institute of Technology and Prof. Holger Hoos from RWTH Aachen University: munichlectures.ai

🗓️ December 17, 2024
🕙 16:00 CET
🏫 Senatssaal, #LMU Munich

1 year ago 4 1 2 1

Kicking off our TUM AI - Lecture Series tomorrow with none other than Jiaming Song, CSO @LumaLabsAI.

He'll be talking about "Dream Machine: Emergent Capabilities from Video Foundation Models".

Live stream: youtu.be/oilWwsXZamA
7pm GMT+1 / 10am PST (Mon Dec 2nd)

1 year ago 42 6 1 0

Posts by A. Sophia Koepke