Sumedh Hindupur (@sumedh-hindupur) Bsky

Huge thanks to @ekdeepl.bsky.social , @thomasfel.bsky.social
and my advisor Demba Ba for all the assistance and contributions to this project!

1 year ago 0 0 0 0

In Vision, SpaDE learns very interesting concepts! On ImageNette, a 10-class subset of ImageNet, for the English Springer class, it shows concepts that indicate the ears, muzzle, eye region, neck, paws, etc!
Do check out the paper: arxiv.org/abs/2503.01822 for more results!

1 year ago 0 0 1 0

💡 Results on real model activations: Across vision & language tasks, SpaDE finds monosemantic features better than ReLU, JumpReLU, or TopK SAEs.

It also tiles concepts beautifully.

1 year ago 0 0 1 0

SpaDE also captures concept heterogeneity, adaptively allocating sparsity levels to different concepts based on their intrinsic dimension, something TopK struggles with.

1 year ago 0 0 1 0

SpaDE captures nonlinearly separable features better than ReLU, JumpReLU, or TopK SAEs. It also shows very interesting, local receptive fields!
It tiles concept space more effectively, avoiding cross-concept correlations.

1 year ago 0 0 1 0

🛠️ Our Solution: SpaDE

We designed SpaDE, a novel SAE that explicitly accounts for nonlinear separability and heterogeneous dimensionality. SpaDE projects distances onto the probability simplex.
It recovers previously hidden concepts that standard SAEs completely miss!

1 year ago 0 0 1 0

🔬 Testing the Assumptions: We analyzed SAEs across different settings—from toy models to real-world neural activations.
Result? SAEs fail when concepts have nonlinear separability (ReLU, JumpReLU) or heterogeneous concepts (TopK).

1 year ago 0 0 1 0

The Big Idea: SAE encoders impose constraints on the soultion to dictionary learning, which lead to assumptions about concepts.
SAE encoders are linear transformations followed by orthogonal projections onto different sets, which dictate receptive fields and hence assumptions.

1 year ago 0 0 1 0

New preprint alert!
Do Sparse Autoencoders (SAEs) reveal all concepts a model relies on? Or do they impose hidden biases that shape what we can even detect?
We uncover a fundamental duality between SAE architectures and concepts they can recover.
Link: arxiv.org/abs/2503.01822

1 year ago 14 2 1 2

Posts by Sumedh Hindupur