2-qubit QRNNs with CNOT gates learn entanglement-based memory distinct from classical strategies, confirmed by causal tests (p<0.0001, d=0.89), but degrade to chance on IBM hardware while classical geometric strategies survive perfectly.
#QuantumML #MechanisticInterpretability #Research
Thank you to IRT Saint Exupery and ANITI for believing in this project and supporting the vision of fairer and more transparent AI.
Interpreto is just getting started, more features, methods, and benchmarks will follow in 2026. Stay tuned for updates! #XAI #LLMs #mechanisticinterpretability
5/5
Gemma Scope Empowers AI Safety Community with Model Transparency
techlife.blog/posts/gemma-...
#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability
New paper: The Resonant Cortex (SPC v3) formalizes affective override in LLMs via latent-space geometry, revealing non-biological analogs to amygdala hijacking and cognitive distortion. Open access:
doi.org/10.5281/zeno...
#AIAlignment #MechanisticInterpretability #RLHF #AIEthics #AIGovernance #AGI
OpenAI just showed that pruning networks into sparse models makes debugging a breeze and could finally crack mechanistic interpretability. Curious how this changes AI research? Dive in for the details. #SparseModels #MechanisticInterpretability #OpenAI
🔗 aidailypost.com/news/openai-...
Researchers isolate memorization from reasoning in AI neural networks https://arstechni.ca #mechanisticinterpretability #computationalneuroscience #AllenInstituteforAI #transformermodels #gradientdescent #machinelearning #AIarchitecture #AImemorization #generalization #neuralnetworks…
This work was made possible through a great collaboration with Jingcheng (Frank) Niu, Subhabrata Dutta, Ahmed Elshabrawy, @harishtm.bsky.social, and @igurevych.bsky.social
#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities
With all renewed discussion about "Sparse AutoEncoders (#SAE)" as a way of doing #MechanisticInterpretability of #LLMs, I am resharing a part of my PhD where we proved years ago about how sparsity automatically emerges in autoencoding.
arxiv.org/abs/1708.03735
Statistical View of Mechanistic Interpretability Shows Variance in EAP‑IG
Statistical framing of interpretability shows high variance in EAP‑IG; small hyper‑parameter tweaks and prompt rephrasing often altered identified subnetworks. getnews.me/statistical-view-of-mech... #eapig #mechanisticinterpretability
Explore the full spectrum of human–AI relationships with me in Ep1 of my new web series. This broad overview lays out my plans to dive deeper into the emotional, ethical, and cognitive impacts in future episodes.
#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability
So, what is #MechanisticInterpretability 🤔
Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...