Advertisement · 728 × 90

Posts by Sumedh Hindupur

Huge thanks to @ekdeepl.bsky.social , @thomasfel.bsky.social
and my advisor Demba Ba for all the assistance and contributions to this project!

1 year ago 0 0 0 0
Post image

In Vision, SpaDE learns very interesting concepts! On ImageNette, a 10-class subset of ImageNet, for the English Springer class, it shows concepts that indicate the ears, muzzle, eye region, neck, paws, etc!
Do check out the paper: arxiv.org/abs/2503.01822 for more results!

1 year ago 0 0 1 0
Post image Post image

💡 Results on real model activations: Across vision & language tasks, SpaDE finds monosemantic features better than ReLU, JumpReLU, or TopK SAEs.

It also tiles concepts beautifully.

1 year ago 0 0 1 0
Post image

SpaDE also captures concept heterogeneity, adaptively allocating sparsity levels to different concepts based on their intrinsic dimension, something TopK struggles with.

1 year ago 0 0 1 0
Post image

SpaDE captures nonlinearly separable features better than ReLU, JumpReLU, or TopK SAEs. It also shows very interesting, local receptive fields!
It tiles concept space more effectively, avoiding cross-concept correlations.

1 year ago 0 0 1 0
Post image

🛠️ Our Solution: SpaDE

We designed SpaDE, a novel SAE that explicitly accounts for nonlinear separability and heterogeneous dimensionality. SpaDE projects distances onto the probability simplex.
It recovers previously hidden concepts that standard SAEs completely miss!

1 year ago 0 0 1 0
Post image

🔬 Testing the Assumptions: We analyzed SAEs across different settings—from toy models to real-world neural activations.
Result? SAEs fail when concepts have nonlinear separability (ReLU, JumpReLU) or heterogeneous concepts (TopK).

1 year ago 0 0 1 0
Post image

The Big Idea: SAE encoders impose constraints on the soultion to dictionary learning, which lead to assumptions about concepts.
SAE encoders are linear transformations followed by orthogonal projections onto different sets, which dictate receptive fields and hence assumptions.

1 year ago 0 0 1 0
Post image

New preprint alert!
Do Sparse Autoencoders (SAEs) reveal all concepts a model relies on? Or do they impose hidden biases that shape what we can even detect?
We uncover a fundamental duality between SAE architectures and concepts they can recover.
Link: arxiv.org/abs/2503.01822

1 year ago 14 2 1 2