Stephanie Chan (@scychan) Bsky

What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices

In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested in this question, check out our new commentary! Thread:

8 months ago 171 53 5 1

Great new paper by @jessegeerts.bsky.social, looking at a certain type of generalization in transformers -- transitive inference -- and what conditions induce this type of generalization

10 months ago 1 0 0 0

New paper: Generalization from context often outperforms generalization from finetuning.

And you might get the best of both worlds by spending extra compute and train time to augment finetuning.

11 months ago 4 0 0 0

It was such a pleasure to co-supervise this research, but
@aaditya6284.bsky.social should really take the bulk of the credit :)

And thank you so much to all our wonderful collaborators, who made fundamental contributions as well!
Ted Moskovitz, Sara Dragutinovic, Felix Hill, @saxelab.bsky.social

1 year ago 2 0 0 0

This paper is dedicated to our collaborator
Felix Hill, who passed away recently. This is our last ever paper with him.

It was bittersweet to finish this research, which contains so much of the scientific spark that he shared with us. Rest in peace Felix, and thank you so much for everything.

1 year ago 3 0 1 0

Some general takeaways for interp:

1 year ago 2 0 1 0

4. We provide intuition for these dynamics through a simple mathematical model.

1 year ago 2 0 1 0

3. A lot of previous work (including our own), has emphasized *competition* between in-context and in-weights learning.

But we find that cIWL and ICL actually compete AND cooperate, via shared subcircuits. In fact, ICL cannot emerge if cIWL is blocked from emerging, even though ICL emerges first!

1 year ago 2 0 1 0

2. At the end of training, ICL doesn't give way to in-weights learning (IWL), as we previously thought. Instead, the model prefers a surprising strategy that is a *combination* of the two!

We call this combo "cIWL" (context-constrained in-weights learning).

1 year ago 2 0 1 0

1. We aimed to better understand the transience of in-context-learning (ICL) -- where ICL can emerge but then disappear after long training times.

1 year ago 2 0 1 0

Dropping a few high-level takeaways in this thread.

For more details please see Aaditya's thread,
or the paper itself.
bsky.app/profile/aadi...
arxiv.org/abs/2503.05631

1 year ago 3 0 1 0

New work led by
@aaditya6284.bsky.social

"Strategy coopetition explains the emergence and transience of in-context learning in transformers."

We find some surprising things!! E.g. that circuits can simultaneously compete AND cooperate ("coopetition") 😯 🧵👇

1 year ago 9 4 1 0

For Felix Devastatingly, we have lost a bright light in our field. Felix Hill was not only a deeply insightful thinker -- he was also a generous, thoughtful mentor to many researchers. He majorly changed my lif...

Sadly, we have lost a brilliant researcher and colleague, Felix Hill. Please see this note, where I have tried to compile some of his writings: docs.google.com/document/d/1...

1 year ago 16 1 0 0

The broader spectrum of in-context learning The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning...

What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7

1 year ago 122 32 2 1

Introducing the :milkfoamo: emoji

1 year ago 1 1 0 0

Hahaha. We need a cappuccino emoji?!

1 year ago 0 0 1 0

I'll be not at Neurips this week. Let's grab coffee if you want to fomo-commiserate with me

1 year ago 12 0 0 1

Hello hello. Testing testing 123

1 year ago 9 0 3 0

Posts by Stephanie Chan