Taylor Webb (@taylorwwebb) Bsky

Improving Latent Generalization Using Test-time Compute Language Models (LMs) exhibit two distinct mechanisms for knowledge acquisition: in-weights learning (i.e., encoding information within the model weights) and in-context learning (ICL). Although these...

When and how can test-time thinking allow models to use information latent in their training data? What are the benefits and tradeoffs relative to other solutions like synthetic data augmentation? Pleased to share (after a long delay) an exploration of these issues: arxiv.org/abs/2604.01430 thread:

2 weeks ago 24 7 1 0

The brain can clearly represent novel combinations of features, we need to be able to explain how it does it!

1 month ago 1 0 0 0

potentially be very important. Many would argue that the ability to adaptively handle these types of situations is a very important part of human intelligence. You seem to be arguing that these cases are rare and therefore uninteresting or not important but that doesn’t follow.

1 month ago 1 0 1 0

Is there any data to support the claim that precomputed conjunctions can handle the vast majority of cases? Even accepting that novel conjunctions are rare (which seems unlikely to me) that doesn’t mean they’re unimportant to the survival of an organism. Novel but rare situations can…

1 month ago 0 0 1 0

To extend your lunch metaphor, sometimes you have to make your own lunch!

1 month ago 1 0 0 0

same basic computation operating over different timescales. Precomputed conjunctive codes represent stable statistical regularities, which can get you pretty far, but sometimes you need to represent novel conjunctions for which you don’t already have precomputed codes.

1 month ago 1 0 1 0

associative learning as implemented with hopfield networks, so one way to think about this is that the models are performing the same kind of associative learning that encodes natural statistics on a longer timescale, but here over a short timescale. I’m not sure why you’re opposed to this…

1 month ago 1 0 1 0

separate tokens with shared parameters. Without this architectural feature, ie if position and feature embeddings for separate objects were additively combined in a single vector space, the assignment (binding!) of features to objects would be lost. Self attention is isomorphic to…

1 month ago 1 1 1 0

the models handle this by using separate tokens to store particular feature conjunctions. When objects span multiple tokens, position ID embeddings track the assignment of tokens to objects, ie they bind those tokens together. This solution fundamentally depends on the presence of…

1 month ago 1 0 1 0

Yes how the system handles the ‘exceptions’ is precisely the question. Dynamic reweighting and attention alone don’t accomplish that. You need some way to encode the in-context conjunction of features, ie binding! As we show in the paper (and other recent work has shown in other domains)…

1 month ago 1 0 1 0

Visual symbolic mechanisms: Emergent symbol processing in vision language models

The study from Li et al and other recent studies eg arxiv.org/html/2506.15... show that vision models develop emergent index-like representations. These results illustrate how these models solve the binding problem, not that binding was unnecessary in the first place.

1 month ago 4 1 1 0

You claim that precomputed conjunctive codes can support 99% of visual processing. Regardless of whether this is the right number, what about the remaining 1%? People can obviously perceive synthetic scenes with arbitrary feature combinations, how?

1 month ago 2 0 0 0

Open Rank Faculty Cluster Hire Search for the New Department of Cognitive Science at Bocconi - Bocconi University

A new Department of Cognitive Science is being created at Bocconi University in Milan, Italy.

Here is the call for a cluster hire (for around 10 faculty) in all areas of cognitive science, at both junior and senior levels:

www.unibocconi.it/en/faculty-a...

Deadline: May 4th, 2026

1 month ago 148 119 3 3

Excited to give a talk at #Cosyne2026 about my PhD work!

We show that RNNs trained on visual search converge on brain-like solutions, producing primate-like behavior and neural representations. Happy to chat if you're at Cosyne!

📅 March 15, 2026
📍 Lisbon, Portugal
www.biorxiv.org/content/10.1...

1 month ago 49 8 1 2

Can large language models *introspect*?

In a new paper, @kmahowald.bsky.social and I study the MECHANISM of introspection in big open-source models.

tldr: Models detect internal anomalies through DIRECT ACCESS, but don't know what the anomalies are.

And they love to guess “apple” 🍎

1 month ago 70 15 2 6

Whither symbols in the era of advanced neural networks? Some of the strongest evidence that human minds should be thought about in terms of symbolic systems has been the way they combine ideas, produce novelty, and learn quickly. We argue that modern neura...

I like this general point about levels of explanation (which I think is similar to the point we make here arxiv.org/abs/2508.05776) but how does it relate to the discussion about mistakes made by LLMs? (Possibly explained by the earlier context of the clip)

1 month ago 0 0 0 0

But this example again appears not to involve the reasoning mode. I agree that ‘thinking’ is confusing nomenclature but it’s notable that most (not all) of the stupidest mistakes come from the feedforward / parallel processing mode.

1 month ago 0 0 0 0

Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models...

We also have a paper on this arxiv.org/abs/2411.00238 but it doesn’t seem to be an arbitrary failure. Instead they seem to fail in precisely the ways that human vision fails under time pressure (including with counting), and increasingly the models seem to resolve this via sequential processing.

1 month ago 3 0 1 0

Re: seeing vs. thinking, 'thinking' is arguably a bad term for this, in the vision setting the models depend on sequential processing to individuate objects but that wouldn't commonly be referred to as 'thinking' in the colloquial sense.

1 month ago 0 0 1 0

Huh, seems to be the result of different prompts, although it's arguably confusing to say that no liquid can be poured into it (at all or only in the current configuration?). In general most of the comically stupid mistakes (e.g. how many b's in blueberry) seem to be from the non-thinking models.

1 month ago 0 0 1 0

Funny, but these demos always seem to be the free / instant model. With thinking turned on it gets this correct.

1 month ago 0 0 1 0

very interesting! Consistent with this, we found that induction heads seem to be completely distinct from what we called 'symbolic induction heads' i.e. function vector heads arxiv.org/abs/2502.20332

1 month ago 2 0 0 0

How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?

@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.

hapax.baulab.info

2 months ago 27 6 1 1

Memorization vs. generalization in deep learning: implicit biases, benign overfitting, and more Or: how I learned to stop worrying and love the memorization

What is the relationship between memorization and generalization in AI? Is there a fundamental tradeoff? In infinitefaculty.substack.com/p/memorizati... I’ve reviewed some of the evolving perspectives on memorization & generalization in machine learning, from classic perspectives through LLMs.

2 months ago 136 27 4 5

Unfortunately the event is in-person only.

2 months ago 2 0 0 0

Mechanistic Basis of Reasoning (in Brains and AI) | IVADO

Very excited for our second workshop on the computational ingredients of reasoning (Feb 24-27), this one focused on mechanisms of reasoning in both AI and the brain. Check out the program to see our amazing lineup of speakers, and please consider attending! ivado.ca/en/events/me...

2 months ago 9 2 1 0

Building compositional tasks with shared neural subspaces Nature - The brain can flexibly perform multiple tasks by compositionally combining task-relevant neural representations.

Thrilled that my paper is out in the @nature.com. We explored how the brain builds complex tasks by compositionally combining simpler sub-task representations. The brain flexibly performs multiple tasks by dynamically reusing neural subspaces for sensory inputs and motor actions

rdcu.be/eRVUk

2 months ago 131 47 4 1

Excited to announce a new book telling the story of mathematical approaches to studying the mind, from the origins of cognitive science to modern AI! The Laws of Thought will be published in February and is available for pre-order now.

4 months ago 167 39 2 7

That is, in order to do the kinds of things that are supposed to require algebraic / rule-based operations, these models actually do something that is algebraic, which both affirms the importance of algebraic operations for human-like reasoning and also shows it doesn't need to be innate.

2 months ago 0 0 1 0

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models Many recent studies have found evidence for emergent reasoning capabilities in large language models (LLMs), but debate persists concerning the robustness of these capabilities, and the extent to whic...

I generally agree, but the interesting thing is that the LLMs/VLMs sometimes do end up doing something very structured and algebraic, as we show e.g. here arxiv.org/abs/2502.20332 and here arxiv.org/abs/2506.15871 (the paper that @romanfeiman.bsky.social 's meme was commenting on).

2 months ago 2 0 1 0

Posts by Taylor Webb