I wrote a new blog post on using the logit lens to interpret protein language models!
liambai.com/logit-lens/
It has some interactive visualization you can play around with.
Posts by Liam Bai
It's long seemed that molecular biology is a natural home for ML interpretability research, given the maturity of human-constructed models of biological mechanisms—permitting direct comparison with their ML-derived counterparts—unlike vision and NLP. Our first foray below👇.
A range of features identified from sparse autoencoders trained on different layers on ESM2-650M
This might be the best paper on applying sparse autoencoders to protein language models. The authors identify how neural networks trained on amino acid sequences "discover" different features, some specific to individual protein families, other for substructures
www.biorxiv.org/content/10.1...
Can we learn protein biology from a language model?
In new work led by @liambai.bsky.social and me, we explore how sparse autoencoders can help us understand biology—going from mechanistic interpretability to mechanistic biology.