Martin Schrimpf (@mschrimpf) Bsky

Research Engineer, NeuroAI Research Engineer, NeuroAI

My lab is hiring a software engineer to support our #NeuroAI research: careers.epfl.ch/job/Lausanne.... Please consider applying if you want to build out the infrastructure enabling models of the human brain & mind (e.g., www.Brain-Score.org). We will start screening applications this week 🧠🤖

3 weeks ago 22 10 0 0

Compact deep neural network models of the visual cortex Nature - Parsimonious deep neural network models can be used for prediction of visual neuron responses.

DNN models of the brain are getting bigger. Are we replacing one complicated system in vivo with another in silico?

In new work, we seek the *smallest* DNN models of visual cortex, balancing prediction with parsimony.

It turns out these compact models are surprisingly small!

rdcu.be/e5H8G

1 month ago 127 47 3 4

I believe the results in your and Ebrahim's paper, but I do not understand why this particular configuration is so important. If you agree with point 2, then that is much more general than what we did in 2021 (more models & data) and with a more stringent metric -- and the core claim stands.

2 months ago 0 0 0 6

2. NWP task performance is correlated with brain alignment in a larger set of models and datasets (going beyond our 2021 set).

I understand your pushback to be that NWP-correlates-brainalignment does _not_ hold when using the *exact* 2021 models and datasets, *but* with a different metric.

2 months ago 1 0 1 0

(maybe more as a personal summary from this, don't feel obliged to respond.)
I believe we agree on two things:
1. The results from Schrimpf et al. 2021 with the exact same specifications (datasets, metrics, models) are perfectly reproducible from the open-source code.

2 months ago 0 0 1 0

I personally see Brain-Score as an evolving set of benchmarks that is improved over time (and not as a static goalpost). Indeed our community is updating it with more rigorous alignment tests and better models. I hope you will consider contributing!

2 months ago 2 0 0 0

In vision, Yamins & Hong et al 2014 first established a correspondence between object classification accuracy and ventral stream alignment on a dataset that is very easy by today's standards; which has now been extended to ImageNet, larger and more diverse neural data etc. See Brain-Score.org/vision

2 months ago 2 0 1 0

I guess what you mean is whether we should move past the particular methodologies we used in the 2021 paper by testing alignment more stringently and building even better brain models -- I absolutely think so!

2 months ago 1 0 1 0

The core claim you mean is "Models that perform better at predicting the next word in a sequence also better predict brain measurements" -- and yes, that indeed has been validated and extended by many follow-up studies. As you said yourself, the results can also be perfectly replicated.

2 months ago 1 0 2 0

Re-Align Hackathon Leaderboard - a Hugging Face Space by representational-alignment Submit Blue/Red hackathon JSON and rank by alignment scores.

🚀 The Re-Align Challenge is now LIVE!

We’re inviting you to explore what properties of vision models and data lead to convergences and divergences in representational alignment.

🔗 Get started: huggingface.co/spaces/repre...

🧵👇

2 months ago 8 1 1 2

What's your sense as to why that is? Our intuition from the 2025 EMNLP paper is that more scaled models develop a lot more capabilities beyond formal "core" language processing; I'm curious if you agree

2 months ago 1 0 1 0

correction: the original implementation was incorrect and @kartikpradeepan.bsky.social updated the model PR thanks to @ebrahimfeghhi.bsky.social linking the open source code. Updates here: bsky.app/profile/msch...

2 months ago 2 0 0 0

hi Ebrahim, I responded in this thread: bsky.app/profile/msch.... Happy to discuss more

2 months ago 1 0 0 0

Either way I'm glad the OASM model is now part of the open-source community platform, this will be a great reference point. With the new benchmarks soon on Brain-Score, we can encourage the development of models that generalize much better than what we did 5 years ago

2 months ago 0 0 0 0

Regarding the original claims: Badr & others reproduced the correlation to NWP performance with the new benchmarks and newer models so I see no reason for the 2021 claims to be invalid. These new benchmarks are enforcing stronger generalization (great!) but that doesn't mean the old ones were wrong.

2 months ago 1 0 2 0

The AlKhamissi et al 2025 benchmarks are most stringent afaict since they split on stories instead of contiguous k-folds, which prevents temporal autocorrelation within a story. (L)LMs indeed score much higher than OASM here. I'm glad OASM is now integrated in Brain-Score as a useful reference!

2 months ago 1 0 2 0

Thanks @kartikpradeepan.bsky.social for confirming that this model indeed scores highly on the earlier benchmarks with part-of-sentence-splits! Building on Feghhi & Hadidi et al 2024, AlKhamissi et al 2025 had identified the most stringent benchmarks. We should have merged this PR sooner. 1/

2 months ago 1 0 1 2

Beyond linear regression: mapping models in cognitive neuroscience should align with research goals Many cognitive neuroscience studies use large feature sets to predict and interpret brain activity patterns. Feature sets take many forms, from human stimulus annotations to representations in deep ne...

Some re-mapping is necessary even for predicting one brain's activity from another, esp. in higher areas. Linear regression is one of the more restrictive ways to achieve this between two brains so we use the same for models. @neuranna.bsky.social wrote about this here: arxiv.org/abs/2208.10668

2 months ago 1 0 0 0

I'll continue in this thread where @ebrahimfeghhi.bsky.social has been helpful with linking the code. I would like to remind you that there is a human at the other end of the screen and that no information will be lost by keeping this friendly.

2 months ago 4 0 1 0

Thanks Ebrahim! Would you be interested in submitting this model directly to Brain-Score? Alternatively I can let cursor attempt it again but as you pointed out, it doesn't necessarily get it right

2 months ago 0 0 1 0

Nima you're very much welcome to update the PR. You are even more welcome to use the Brain-Score platform as we stated previously. I don't know how we can reach common ground if you don't either use the same benchmark implementation, or release your model code.

2 months ago 0 0 1 0

I am of course happy to be proven wrong, but I find the framing of this preprint a bit frustrating. We gave similar feedback before, yet the manuscript doesn't seem to engage with the counter-evidence. I would appreciate clarification on the results discrepancy -- please feel free to update the PR!

2 months ago 20 2 3 0

This is significantly lower than the paper's reported number and far below gpt2-xl (which in the paper is outperformed by oasm). So something does not track here, either in the preprint's re-implementation of the benchmark or my reconstruction of the model.

2 months ago 12 0 1 1

add OASM model from Hadidi et al. 2025 by mschrimpf · Pull Request #355 · brain-score/language Cursor-aided implementation based on the paper. Preliminary results from local run: 0.34 on Pereira2018-linear

3. I implemented and submitted the authors' model to Brain-Score (see PR#355 github.com/brain-score/...). The implementation follows the paper as I could not find a code release. It obtains a ceiling-normalized score of 0.34 on the criticized Pereira2018 benchmark.

2 months ago 10 0 2 0

-- this work includes null models such as randomly-assigned stimuli responses. Brain-Score language includes benchmarks that use this stronger form of generalization, which we flagged about a year ago.

2 months ago 6 0 1 0

2. Splitting across larger temporal chunks (eg stories) is indeed a stronger form of generalization than smaller chunks (eg sentences). @bkhmsi.bsky.social tackled this in his EMNLP'25 where we identified the most stringent evaluation of brain alignment to be linear predictivity with story splits

2 months ago 9 0 1 0

The perhaps strongest support for this point is that recent LLMs confirm the original prediction: as their task performance improved, their alignment to the human brain further increased (see e.g. Shen et al. 2025).

2 months ago 8 0 1 0

Thank you Dan for the ping! As far as I can tell, all of the original claims hold, for the following reasons:

1. The relationship between next-word prediction performance and brain alignment has been replicated in several other studies (eg Caucheteux et al 2022; De Varda et al 2025; Mischler 2024).

2 months ago 12 0 2 2

Looking forward to presenting at the #AAAI #NeuroAI workshop; including 3 projects that were just accepted to ICLR! arxiv.org/abs/2509.24597, arxiv.org/abs/2510.03684, arxiv.org/abs/2506.13331 🧪🧠🤖

2 months ago 21 3 0 0

🎉 Re-Align is back for its 4th edition at ICLR 2026!

📣 We invite submissions on representational alignment, spanning ML, Neuroscience, CogSci, and related fields.

📝 Tracks: Short (≤5p), Long (≤10p), Challenge (blog)

⏰ Deadline: Feb 5, 2026 for papers

🔗 representational-alignment.github.io/2026/

3 months ago 15 9 1 4

Posts by Martin Schrimpf