Advertisement · 728 × 90

Posts by Andrew Lampinen

Thanks to @arslanchaudhry.bsky.social and Sridhar for spearheading this work, I did very little for it!

2 weeks ago 1 0 0 0

However, certain kinds of structure, such as reversals, remain much harder to use with thinking than they are in context — suggesting important challenges to be overcome! 5/

2 weeks ago 1 0 1 0
Post image

Indeed, models can learn this capability in a way that generalizes both inside (top) and outside (bottom) the training distribution! (Red bars here: green = ICL should be seen as a top line number.) 4/

2 weeks ago 2 0 1 0

However, an obvious test-time alternative is thinking/CoT. Can models learn via RL how to bring relevant latent knowledge from their training data into context to use it at test time? 3/

2 weeks ago 1 0 1 0

In our prior works: bsky.app/profile/lamp...
we highlighted latent learning as a challenge for AI systems, and explored test-time retrieval and train-time augmentation as solutions. 2/

2 weeks ago 3 0 1 0
Preview
Improving Latent Generalization Using Test-time Compute Language Models (LMs) exhibit two distinct mechanisms for knowledge acquisition: in-weights learning (i.e., encoding information within the model weights) and in-context learning (ICL). Although these...

When and how can test-time thinking allow models to use information latent in their training data? What are the benefits and tradeoffs relative to other solutions like synthetic data augmentation? Pleased to share (after a long delay) an exploration of these issues: arxiv.org/abs/2604.01430 thread:

2 weeks ago 24 7 1 0

But even if it doesn't solve every problem, might be better than current approaches!

And RE: talk derailing definitely been there (especially talking about this set of findings)

2 weeks ago 1 0 1 0

Thanks! The topological metrics question is interesting; it might indeed help in this case. I suspect that there are some cases when there are alternative solutions that aren't as clearly topologically similar (off the top of my head, e.g., computing something via an FFT vs. in the original domain).

2 weeks ago 1 0 1 0
Advertisement

in this work is simpler than real neuroscience; with recurrence or environment interaction it becomes more complicated to say what exactly a representation is or what its consequences are.

Sorry we missed your relevant work by the way! 2/2

2 weeks ago 1 0 1 0

Not likely to remain agreeable for long now that you called it out on socials 😅
But glad you like it! I think this definition really owes a lot to the works cited in the previous sentence (particularly Rosa Cao's). I also think that we got to state it more straightforwardly because our setting 1/2

2 weeks ago 1 0 1 0

P.S.: For some older discussion of the preprint version, see this thread:
bsky.app/profile/lamp...

1 month ago 1 1 0 0

Thanks to my wonderful coauthors @scychan.bsky.social, Effie Li, and Katherine Hermann! 6/6

1 month ago 1 2 1 0
Preview
Representation Biases: Variance Is Not Always a Good Proxy for Importance A central approach in neuroscience is to analyze neural representations as a means to understand a system's function, through the use of methods like principal component analysis, regression, and repr...

We also discuss the origins of these biases, and a conceptual case study about homomorphically-encrypted representations, and paths to addressing these issues. The paper can be found here if you're interested: www.eneuro.org/content/13/3... 5/

1 month ago 1 1 1 0
Post image

In the presence of such biases, many neuroscience analyses can give misleading results (e.g., illustrated here with RSA, but true of many other methods as well). 4/

1 month ago 3 1 1 0
From population variance explained, PCA (or clustering), or unit-level analyses, the model seems to strongly represent an easy feature and barely represent a harder one — even though it is linearly outputting both from this representation layer.

From population variance explained, PCA (or clustering), or unit-level analyses, the model seems to strongly represent an easy feature and barely represent a harder one — even though it is linearly outputting both from this representation layer.

We discuss our experiments in machine learning models showing that representations can be strongly biased towards some features carrying much more variance than others (at both population and unit level), even if they play equivalent computational roles. 3/

1 month ago 3 1 1 0
What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns
Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices

What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices

In this work we raise fundamental questions about a key assumption of many analyses in neuroscience: that how much variance a feature explains in the representations of the system is a good proxy for how important it is to the system. 2/

1 month ago 3 1 1 0
Preview
Representation Biases: Variance Is Not Always a Good Proxy for Importance A central approach in neuroscience is to analyze neural representations as a means to understand a system's function, through the use of methods like principal component analysis, regression, and repr...

Pleased to share that our paper "Representation Biases: Variance is Not Always a Good Proxy for Importance" is now out as Theory/New Concepts paper in eNeuro!
www.eneuro.org/content/13/3... 1/

1 month ago 73 30 1 0
Advertisement

Thanks! Hope things are going well :)

1 month ago 1 0 0 0

I joined Anthropic (alignment team) this week — exciting place to be at an exciting time!

1 month ago 215 3 18 0
Post image

Sharing “Neural Thickets”. We find:

In large models, the neighborhood around pretrained weights can become dense with task-improving solutions.

In this regime, post-training can be easy; even random guessing works

Paper: arxiv.org/abs/2603.12228
Web: thickets.mit.edu

1/

1 month ago 110 23 7 5
title section of the paper: “Cross-Modal Taxonomic Generalization in (Vision) Language Models” by Tianyang Xu, Marcelo Sandoval-Castañeda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.

title section of the paper: “Cross-Modal Taxonomic Generalization in (Vision) Language Models” by Tianyang Xu, Marcelo Sandoval-Castañeda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.

What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?

Excited to share new work understanding “Cross-modal taxonomic generalization” in (V)LMs

arxiv.org/abs/2603.07474

1/

1 month ago 34 12 1 1
Preview
The no-magic approach to understanding intelligent systems Today I want to write a bit about the philosophy I think underlies much of the work that my collaborators and I (as well as many other researchers that I respect) have done on understanding artificial...

Short post on what I call the "no-magic approach to understanding intelligent systems" — the philosophy I think of as motivating our work on understanding intelligence without resorting to magical thinking about AI or humans!
infinitefaculty.substack.com/p/the-no-mag...

1 month ago 33 5 1 1
Post image

Can large language models *introspect*?

In a new paper, @kmahowald.bsky.social and I study the MECHANISM of introspection in big open-source models.

tldr: Models detect internal anomalies through DIRECT ACCESS, but don't know what the anomalies are.

And they love to guess “apple” 🍎

1 month ago 70 15 2 6

Thank you! :)

1 month ago 1 0 1 0

But I'll forever be grateful for the privilege of being a part of DM through such an exciting time, for getting to work on many amazing projects, and for the wonderful collaborators and dear friends I've made along the way.

1 month ago 12 0 1 0
Advertisement

With all these changes, I've started to wonder if it would be easier to more effectively do the work that I think is most important and exciting somewhere else. After a short break, I'm excited to try something new (more to come soon, I hope).

1 month ago 17 0 1 0
View of London from a rooftop in Kings Cross

View of London from a rooftop in Kings Cross

After 5.5 years (or 7 or 9, counting internships), today was my last day at Google/DeepMind. When I was in London recently, I walked through the two floors that were (most of) DeepMind when I first joined, and thought about how much the company and field have changed since then.

1 month ago 68 0 2 0
Post image

🚨New preprint! In-context learning underlies LLMs’ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Let’s see! 👀

1 month ago 38 5 1 1

Really cool work — learning over sequential experiences that contain the embodied cue of viewpoint as well as visual inputs, can give rise to human-like 3D shape perception!

1 month ago 10 1 0 0
Preview
Dileep George joins Astera to lead its neuro-inspired AGI effort Dileep George is joining Astera as Head of AI, leading our AGI research division. Working alongside our Chief Scientist Doris Tsao, he and the team will explore novel, brain-inspired computational arc...

News! I've joined the Astera Institute to lead its neuroscience based AGI research. Backed by $1B+ commitment over the coming decade, my team will explore novel, brain-inspired architectures and algos toward safe, efficient human-like AGI, working alongside Doris Tsao. 1/

astera.org/dileep-georg...

1 month ago 84 6 13 2