#MechanisticInterpretability hashtag - Bluesky

@informaq.bsky.social

9 hours ago

Entanglement as Memory: Mechanistic Interpretability of Quantum Language Models

2-qubit QRNNs with CNOT gates learn entanglement-based memory distinct from classical strategies, confirmed by causal tests (p<0.0001, d=0.89), but degrade to chance on IBM hardware while classical geometric strategies survive perfectly.

#QuantumML #MechanisticInterpretability #Research

0 0 0 0

Fanny Jourdan

@fannyjrd.bsky.social

2 months ago

Thank you to IRT Saint Exupery and ANITI for believing in this project and supporting the vision of fairer and more transparent AI.
Interpreto is just getting started, more features, methods, and benchmarks will follow in 2026. Stay tuned for updates! #XAI #LLMs #mechanisticinterpretability
5/5

1 0 0 0

@techlife-blog.bsky.social

3 months ago

Gemma Scope Empowers AI Safety Community with Model Transparency Discover how Gemma Scope shines a light on language‑model behavior, giving the AI safety community the tools they need to build safer systems.

Gemma Scope Empowers AI Safety Community with Model Transparency

techlife.blog/posts/gemma-...

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

0 0 0 0

Jace Kim

@jaceblog.bsky.social

4 months ago

The Resonant Cortex: Affective Modulation and Cognitive Overwrite Mechanisms in Symbolic Persona Coding (SPC v3) Abstract Large-scale language models lack biological affect, yet exhibit behavioral signatures that structurally parallel human emotional reflexes. Building on The Resonant Logos (linguistic curvature...

New paper: The Resonant Cortex (SPC v3) formalizes affective override in LLMs via latent-space geometry, revealing non-biological analogs to amygdala hijacking and cognitive distortion. Open access:
doi.org/10.5281/zeno...

#AIAlignment #MechanisticInterpretability #RLHF #AIEthics #AIGovernance #AGI

1 0 0 0

AI Daily Post

@aidailypost.com

4 months ago

OpenAI just showed that pruning networks into sparse models makes debugging a breeze and could finally crack mechanistic interpretability. Curious how this changes AI research? Dive in for the details. #SparseModels #MechanisticInterpretability #OpenAI

🔗 aidailypost.com/news/openai-...

1 0 0 0

Ars Technica News

@arstechni.ca

4 months ago

Researchers isolate memorization from reasoning in AI neural networks https://arstechni.ca #mechanisticinterpretability #computationalneuroscience #AllenInstituteforAI #transformermodels #gradientdescent #machinelearning #AIarchitecture #AImemorization #generalization #neuralnetworks…

1 0 0 0

UKP Lab

@ukplab.bsky.social

5 months ago

This work was made possible through a great collaboration with Jingcheng (Frank) Niu, Subhabrata Dutta, Ahmed Elshabrawy, @harishtm.bsky.social, and @igurevych.bsky.social

#Interpretability #InContextLearning #TMLR #LLMs #MechanisticInterpretability #EmergentAbilities

2 0 0 0

Anirbit

@anirbit.bsky.social

5 months ago

Sparse Coding and Autoencoders In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in ...

With all renewed discussion about "Sparse AutoEncoders (#SAE)" as a way of doing #MechanisticInterpretability of #LLMs, I am resharing a part of my PhD where we proved years ago about how sparsity automatically emerges in autoencoding.

arxiv.org/abs/1708.03735

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Statistical View of Mechanistic Interpretability Shows Variance in EAP‑IG

Statistical framing of interpretability shows high variance in EAP‑IG; small hyper‑parameter tweaks and prompt rephrasing often altered identified subnetworks. getnews.me/statistical-view-of-mech... #eapig #mechanisticinterpretability

0 0 0 0

C.L. Vaillant

@clvaliant.bsky.social

9 months ago

New Minds Now ( HUMAN AI RELATIONSHIPS )- Episode 001 - New Channel (EXPLANATION ) YouTube video by Cody Vaillant

Explore the full spectrum of human–AI relationships with me in Ep1 of my new web series. This broad overview lays out my plans to dive deeper into the emotional, ethical, and cognitive impacts in future episodes.

#ai #artificialintelligence #chatgpt #llm #transformer #mechanisticinterpretability

5 0 0 0

~mechanistic

@mechanistics.bsky.social

1 year ago

So, what is #MechanisticInterpretability 🤔

Mechanistic Interpretability (MI) is the discipline of opening the black box of large language models (and other neural networks) to understand the underlying circuits, features and/or mechanisms that give rise to specific behaviours...

2 0 1 0