Antonin Poché (@antoninpoche) Bsky

I forgot to say that I am currently in Berlin till the end of the month. 🇩🇪

If you want to have a beer/tea, please send a message!🍻

I am visiting the XplaiNLP group (@nfel.bsky.social...) at @tuberlin.bsky.social 🤗

Next Thursday, I will talk about Concept-based explanations at @dfki.bsky.social

4 weeks ago 2 0 0 0

Can Persona Prompting function as a lens on social reasoning?

In our #EACL2026 work (led by @jingyng.bsky.social), we investigate how it impacts the quality of model outputs and rationales.

🗞️ arXiv: arxiv.org/abs/2601.20757

Come and find us (Jing, Moritz, Elisabeth, myself) in 🇲🇦 Rabat next week!

1 month ago 10 2 1 0

BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October!

This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️

Stay tuned for more details!

1 month ago 16 7 1 2

Ho and I also made a longer video with a voice-over if it's useful to anyone.

🔊

1 month ago 0 0 0 0

If you are interested in the library, you can check out the corresponding thread below:
bsky.app/profile/anto...

Or the GitHub directly: github.com/FOR-sight-ai...

1 month ago 0 0 1 0

🔥Super excited to share our new demo website for 🪄Interpreto!

🖼️It is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.

🎮Play with it: for-sight-ai.github.io/interpreto-d...

We will keep improving it, so stay tuned!

1 month ago 9 3 1 0

I also did a thread to present the library quickly:

bsky.app/profile/anto...

2 months ago 0 0 0 0

Pleasently surprised to see our blog post trending on HuggingFace 🤗

Well, @fannyjrd.bsky.social did a great job! 🚀

If you missed it, check it out: huggingface.co/blog/Fannyjr...

It's a didactic presentation of our new library: 🪄 Interpreto:
github.com/FOR-sight-ai...

2 months ago 3 1 1 0

It was an honor to be part of this awesome project! Interpreto is a great up-and-coming tool for concept-based interpretability analyses of NLP models, check it out!

3 months ago 8 1 0 0

GitHub - FOR-sight-ai/interpreto: 🪄 Interpreto is an interpretability toolbox for LLMs 🪄 Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

🎉 I’m thrilled to announce the release of Interpreto: a user-friendly, open-source toolbox to make NLP model interpretability accessible, practical, and rigorous.
github.com/FOR-sight-ai...
🧵1/5

3 months ago 7 1 1 0

GitHub - FOR-sight-ai/interpreto: 🪄 Interpreto is an interpretability toolbox for LLMs 🪄 Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

📦You can find the library on GitHub: github.com/FOR-sight-ai...

📚Access the documentation: for-sight-ai.github.io/interpreto/

⏬Download with pip: `uv pip install interpreto`

📰Look at our paper: arxiv.org/abs/2512.097...

🤗 Check our Huggingface blog post: huggingface.co/blog/Fannyjr...

8/8

3 months ago 4 1 0 0

🔥The amazing team: @fannyjrd.bsky.social, Thomas Mullor, @gsarti.com, Frédéric Boisnard, Corentin Friedrich, Charlotte Claye, François Hooft, and Raphaël Bernas!!

🙏And to the supporters: IRT Saint Exupery, ANITI, @centralesupelec.bsky.social, DEEL.ai and FOR projects.

7/8

3 months ago 2 0 1 0

Overview - Interpreto Interpretability Toolkit for LLMs

You can do all these steps in interpreto using a wide range of methods.

Check out the documentation for more details: for-sight-ai.github.io/interpreto/a...

Or the tutorials:

- for-sight-ai.github.io/interpreto/n...
- for-sight-ai.github.io/interpreto/n...

6/8

3 months ago 1 0 1 0

For concepts, there are 4 steps:

1. Split the model and get activations. (wraps `nnsight` @ndif-team.bsky.social)

2. Find patterns in activations (SAEs...) (wraps `overcomplete` @thomasfel.bsky.social )

3. Interpret the concepts

4. Estimate concepts' contributions to the output

5. Evaluate

5/8

3 months ago 2 0 1 0

💡Interpreto provides concept-based explanations (post-hoc unsupervised), part of the Mechanistic Interpretability field. Concepts answer:

❔What higher-level features exist inside the model’s hidden space, and how do they affect outputs?

4/8

⬇️Example on the AG News dataset.

3 months ago 1 0 1 0

🔥 We implement the classic attribution methods. Both `ForSequenceClassification` and `ForCausalLM`.

There are both perturbation-based ➡️ and gradient-based methods 🔁. About 10 methods globally.

📊There are two metrics.

🔹🔷🟦You can fix the granularity of explanations.

3/8

3 months ago 1 0 1 0

🎓➡️👥The goal of the library is to bridge the gap between practitioners applying interpretability methods and the SOTA.

🚀The library is still in active development. Hence, we welcome your feedback and contributions. 🤗

👋📨 Raise an issue, open a PR, or contact us.

2/8

3 months ago 2 0 1 0

🔥I am super excited for the official release of an open-source library we've been working on for about a year!

🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)

3 months ago 20 9 1 3

If you use GMail, AI (Gemini) was turned on yesterday by default and now scans all of your content for machine learning. To turn off, go to Settings>General and scroll down. Uncheck the box for "Smart features."

There's other "Smart" add-ons as well, but that's the one that reads your content.

5 months ago 10748 7994 324 779

🕳️🐇 𝙄𝙣𝙩𝙤 𝙩𝙝𝙚 𝙍𝙖𝙗𝙗𝙞𝙩 𝙃𝙪𝙡𝙡 – 𝙋𝙖𝙧𝙩 𝙄 (𝑃𝑎𝑟𝑡 𝐼𝐼 𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤)

𝗔𝗻 𝗶𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗱𝗲𝗲𝗽 𝗱𝗶𝘃𝗲 𝗶𝗻𝘁𝗼 𝗗𝗜𝗡𝗢𝘃𝟮, one of vision’s most important foundation models.

And today is Part I, buckle up, we're exploring some of its most charming features. :)

6 months ago 36 12 2 0

expressing appreciation for this scientific diagram

6 months ago 50 7 3 0

Can it be biased by people answering randomly.

If you have like 1 person over 5 answering randomly on the other guessing correctly, wouldn't you obtain your blue curve?

6 months ago 1 0 0 0

Want the full story behind the poster? 🎉
I broke down the methodology and results here 👇

8 months ago 0 0 0 0

🔥 I am super excited to be presenting a poster at #ACL2025 in Vienna next week! 🌏

This is my first big conference!

📅 Tuesday morning, 10:30–12:00, during Poster Session 2.

💬 If you're around, feel free to message me. I would be happy to connect, chat, or have a drink!

8 months ago 5 1 1 0

🚨 New preprint! 🚨

Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?

9 months ago 63 12 3 2

🔥ConSim has been accepted to the #ACL2025 main conference!

🙏 Thanks again to my amazing co-authors: @alon_jacovi, Agustin Picard, @VictorBoutin, and @Fannyjrd_.

Work done in DEEL and FOR from IRT St Exupéry and @ANITI_Toulouse.

See you in Vienna 📅

For more information, check out my last post:

11 months ago 4 1 1 0

BlackboxNLP is back! 💥

Happy to be part of the organizing team for this year, and super excited for our new shared task using the excellent MIB Benchmark, check it out! blackboxnlp.github.io/2025/task/

11 months ago 6 2 0 0

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

1 year ago 42 16 3 3

The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."

Hundreds of international students have just received an email telling them their visas have been revoked.

The ‘justification’ is campus activism or social media posts.

timesofindia.indiatimes.com/world/us/hun...

1 year ago 5647 3078 208 609

On the Biology of a Large Language Model

Can we understand the mechanisms of a frontier AI model?

📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

1 year ago 125 28 4 3

Posts by Antonin Poché