Advertisement ยท 728 ร— 90

Posts by Antonin Pochรฉ

I forgot to say that I am currently in Berlin till the end of the month. ๐Ÿ‡ฉ๐Ÿ‡ช

If you want to have a beer/tea, please send a message!๐Ÿป

I am visiting the XplaiNLP group (@nfel.bsky.social...) at @tuberlin.bsky.social ๐Ÿค—

Next Thursday, I will talk about Concept-based explanations at @dfki.bsky.social

4 weeks ago 2 0 0 0
Post image Post image

Can Persona Prompting function as a lens on social reasoning?

In our #EACL2026 work (led by @jingyng.bsky.social), we investigate how it impacts the quality of model outputs and rationales.

๐Ÿ—ž๏ธ arXiv: arxiv.org/abs/2601.20757

Come and find us (Jing, Moritz, Elisabeth, myself) in ๐Ÿ‡ฒ๐Ÿ‡ฆ Rabat next week!

1 month ago 10 2 1 0
Post image

BlackboxNLP will be co-located with EMNLP 2026 in ๐Ÿ‡ญ๐Ÿ‡บ Budapest ๐Ÿ‡ญ๐Ÿ‡บ this October!

This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research ๐Ÿ‘ทโ€โ™‚๏ธ

Stay tuned for more details!

1 month ago 16 7 1 2
Video

Ho and I also made a longer video with a voice-over if it's useful to anyone.

๐Ÿ”Š

1 month ago 0 0 0 0

If you are interested in the library, you can check out the corresponding thread below:
bsky.app/profile/anto...

Or the GitHub directly: github.com/FOR-sight-ai...

1 month ago 0 0 1 0
Video

๐Ÿ”ฅSuper excited to share our new demo website for ๐Ÿช„Interpreto!

๐Ÿ–ผ๏ธIt is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.

๐ŸŽฎPlay with it: for-sight-ai.github.io/interpreto-d...

We will keep improving it, so stay tuned!

1 month ago 9 3 1 0

I also did a thread to present the library quickly:

bsky.app/profile/anto...

2 months ago 0 0 0 0
Post image

Pleasently surprised to see our blog post trending on HuggingFace ๐Ÿค—

Well, @fannyjrd.bsky.social did a great job! ๐Ÿš€

If you missed it, check it out: huggingface.co/blog/Fannyjr...

It's a didactic presentation of our new library: ๐Ÿช„ Interpreto:
github.com/FOR-sight-ai...

2 months ago 3 1 1 0

It was an honor to be part of this awesome project! Interpreto is a great up-and-coming tool for concept-based interpretability analyses of NLP models, check it out!

3 months ago 8 1 0 0
Advertisement
GitHub - FOR-sight-ai/interpreto: ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

๐ŸŽ‰ Iโ€™m thrilled to announce the release of Interpreto: a user-friendly, open-source toolbox to make NLP model interpretability accessible, practical, and rigorous.
github.com/FOR-sight-ai...
๐Ÿงต1/5

3 months ago 7 1 1 0
Preview
GitHub - FOR-sight-ai/interpreto: ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs ๐Ÿช„ Interpreto is an interpretability toolbox for LLMs - FOR-sight-ai/interpreto

๐Ÿ“ฆYou can find the library on GitHub: github.com/FOR-sight-ai...

๐Ÿ“šAccess the documentation: for-sight-ai.github.io/interpreto/

โฌDownload with pip: `uv pip install interpreto`

๐Ÿ“ฐLook at our paper: arxiv.org/abs/2512.097...

๐Ÿค— Check our Huggingface blog post: huggingface.co/blog/Fannyjr...

8/8

3 months ago 4 1 0 0
Post image

๐Ÿ”ฅThe amazing team: @fannyjrd.bsky.social, Thomas Mullor, @gsarti.com, Frรฉdรฉric Boisnard, Corentin Friedrich, Charlotte Claye, Franรงois Hooft, and Raphaรซl Bernas!!

๐Ÿ™And to the supporters: IRT Saint Exupery, ANITI, @centralesupelec.bsky.social, DEEL.ai and FOR projects.

7/8

3 months ago 2 0 1 0
Overview - Interpreto Interpretability Toolkit for LLMs

You can do all these steps in interpreto using a wide range of methods.

Check out the documentation for more details: for-sight-ai.github.io/interpreto/a...

Or the tutorials:

- for-sight-ai.github.io/interpreto/n...
- for-sight-ai.github.io/interpreto/n...

6/8

3 months ago 1 0 1 0
Post image

For concepts, there are 4 steps:

1. Split the model and get activations. (wraps `nnsight` @ndif-team.bsky.social)

2. Find patterns in activations (SAEs...) (wraps `overcomplete` @thomasfel.bsky.social )

3. Interpret the concepts

4. Estimate concepts' contributions to the output

5. Evaluate

5/8

3 months ago 2 0 1 0
Post image

๐Ÿ’กInterpreto provides concept-based explanations (post-hoc unsupervised), part of the Mechanistic Interpretability field. Concepts answer:

โ”What higher-level features exist inside the modelโ€™s hidden space, and how do they affect outputs?

4/8

โฌ‡๏ธExample on the AG News dataset.

3 months ago 1 0 1 0
Video

๐Ÿ”ฅ We implement the classic attribution methods. Both `ForSequenceClassification` and `ForCausalLM`.

There are both perturbation-based โžก๏ธ and gradient-based methods ๐Ÿ”. About 10 methods globally.

๐Ÿ“ŠThere are two metrics.

๐Ÿ”น๐Ÿ”ท๐ŸŸฆYou can fix the granularity of explanations.

3/8

3 months ago 1 0 1 0
Advertisement

๐ŸŽ“โžก๏ธ๐Ÿ‘ฅThe goal of the library is to bridge the gap between practitioners applying interpretability methods and the SOTA.

๐Ÿš€The library is still in active development. Hence, we welcome your feedback and contributions. ๐Ÿค—

๐Ÿ‘‹๐Ÿ“จ Raise an issue, open a PR, or contact us.

2/8

3 months ago 2 0 1 0
Post image

๐Ÿ”ฅI am super excited for the official release of an open-source library we've been working on for about a year!

๐Ÿช„interpreto is an interpretability toolbox for HF language models๐Ÿค—. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)

3 months ago 20 9 1 3

If you use GMail, AI (Gemini) was turned on yesterday by default and now scans all of your content for machine learning. To turn off, go to Settings>General and scroll down. Uncheck the box for "Smart features."

There's other "Smart" add-ons as well, but that's the one that reads your content.

5 months ago 10748 7994 324 779
Video

๐Ÿ•ณ๏ธ๐Ÿ‡ ๐™„๐™ฃ๐™ฉ๐™ค ๐™ฉ๐™๐™š ๐™๐™–๐™—๐™—๐™ž๐™ฉ ๐™ƒ๐™ช๐™ก๐™ก โ€“ ๐™‹๐™–๐™ง๐™ฉ ๐™„ (๐‘ƒ๐‘Ž๐‘Ÿ๐‘ก ๐ผ๐ผ ๐‘ก๐‘œ๐‘š๐‘œ๐‘Ÿ๐‘Ÿ๐‘œ๐‘ค)

๐—”๐—ป ๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ฟ๐—ฒ๐˜๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ฑ๐—ฒ๐—ฒ๐—ฝ ๐—ฑ๐—ถ๐˜ƒ๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐——๐—œ๐—ก๐—ข๐˜ƒ๐Ÿฎ, one of visionโ€™s most important foundation models.

And today is Part I, buckle up, we're exploring some of its most charming features. :)

6 months ago 36 12 2 0
Post image

expressing appreciation for this scientific diagram

6 months ago 50 7 3 0

Can it be biased by people answering randomly.

If you have like 1 person over 5 answering randomly on the other guessing correctly, wouldn't you obtain your blue curve?

6 months ago 1 0 0 0

Want the full story behind the poster? ๐ŸŽ‰
I broke down the methodology and results here ๐Ÿ‘‡

8 months ago 0 0 0 0
Post image

๐Ÿ”ฅ I am super excited to be presenting a poster at #ACL2025 in Vienna next week! ๐ŸŒ

This is my first big conference!

๐Ÿ“… Tuesday morning, 10:30โ€“12:00, during Poster Session 2.

๐Ÿ’ฌ If you're around, feel free to message me. I would be happy to connect, chat, or have a drink!

8 months ago 5 1 1 0
Post image

๐Ÿšจ New preprint! ๐Ÿšจ

Everyone loves causal interp. Itโ€™s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?

9 months ago 63 12 3 2

๐Ÿ”ฅConSim has been accepted to the #ACL2025 main conference!

๐Ÿ™ Thanks again to my amazing co-authors: @alon_jacovi, Agustin Picard, @VictorBoutin, and @Fannyjrd_.

Work done in DEEL and FOR from IRT St Exupรฉry and @ANITI_Toulouse.

See you in Vienna ๐Ÿ“…

For more information, check out my last post:

11 months ago 4 1 1 0
Advertisement

BlackboxNLP is back! ๐Ÿ’ฅ

Happy to be part of the organizing team for this year, and super excited for our new shared task using the excellent MIB Benchmark, check it out! blackboxnlp.github.io/2025/task/

11 months ago 6 2 0 0
Post image

๐ŸŽ‰ Our Actionable Interpretability workshop has been accepted to #ICML2025! ๐ŸŽ‰
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

1 year ago 42 16 3 3
The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."

The biggest reason government officials aren't giving any specifics about the criteria by which these arrests and deportations are selected, is that the criteria is "pro-Israel think-tanks and advocacy organizations created lists of troublesome individuals and gave them to us."

Hundreds of international students have just received an email telling them their visas have been revoked.

The โ€˜justificationโ€™ is campus activism or social media posts.

timesofindia.indiatimes.com/world/us/hun...

1 year ago 5647 3078 208 609
On the Biology of a Large Language Model

Can we understand the mechanisms of a frontier AI model?

๐Ÿ“ Blog post: www.anthropic.com/research/tra...
๐Ÿงช "Biology" paper: transformer-circuits.pub/2025/attribu...
โš™๏ธ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

1 year ago 125 28 4 3