Advertisement · 728 × 90

Posts by Ryota Takatsuki

Great to see this out! Many congratulations @romybeaute.bsky.social 👏🏼👇🏽

2 weeks ago 12 2 0 0

I’m really excited about Diffusion Steering Lens, an intuitive and elegant new “logit lens” technique for decoding the attention and MLP blocks of vision transformers!

Vision is much more expressive than language, so some new mech interp rules apply:

11 months ago 11 3 0 0
Preview
Decoding Vision Transformers: the Diffusion Steering Lens Logit Lens is a widely adopted method for mechanistic interpretability of transformer-based language models, enabling the analysis of how internal representations evolve across layers by projecting th...

This work was done as my internship project at Araya. Huge thanks to my supervisors, Ippei Fujisawa & Ryota Kanai, and my external mentor @soniajoseph.bsky.social for making this happen! 🙏

Link to the paper: arxiv.org/abs/2504.13763
(7/7)

11 months ago 2 0 0 0
Post image Post image

We also validated DSL’s reliability through two interventional studies (head importance correlation & overlay removal). Check out our paper for details!
(6/7)

11 months ago 0 0 1 0
Post image

Below are the top-10 head DSL visualizations by similarity to the input, consistent with residual-stream visualizations from Diffusion Lens.
(5/7)

11 months ago 0 0 1 0
Post image

To fix this, we propose Diffusion Steering Lens (DSL), a training-free method that steers a specific submodule’s output, patches its subsequent indirect contributions, and then decodes it with the diffusion model.
(4/7)

11 months ago 0 0 1 0
Post image Post image

We first adapted Diffusion Lens (Toker et al., 2024) to decode residual streams in the Kandinsky 2.2 image encoder (CLIP ViT-bigG/14) via the diffusion model.
We can visualize how the predictions evolve through layers, but individual head contributions stay largely hidden.
(3/7)

11 months ago 0 0 1 0
Advertisement

Classic Logit Lens projects residual streams to the output space. It works surprisingly well on ViTs, but visual representations are far richer than class labels.
www.lesswrong.com/posts/kobJym...
(2/7)

11 months ago 0 0 1 0

🔍Logit Lens tracks what transformer LMs “believe” at each layer. How can we effectively adapt this approach to Vision Transformers?

Happy to share our “Decoding Vision Transformers: the Diffusion Steering Lens” was accepted at the CVPR 2025 Workshop on Mechanistic Interpretability for Vision!
(1/7)

11 months ago 5 0 1 1

hello world

11 months ago 2 0 0 0