Advertisement · 728 × 90

Posts by Nils Feldhus

Preview
MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings Jean-Philippe Corbeil, Minseon Kim, Maxime Griot, Sheela Agarwal, Alessandro Sordoni, Francois Beaulieu, Paul Vozila. Proceedings of the 19th Conference of the European Chapter of the Association for ...

Was also wondering about that. Really hard to find such studies! These might come close:

* MedRiskEval (Corbeil et al.) – aclanthology.org/2026.eacl-in...
* Hamna et al. – arxiv.org/abs/2509.24506
* Draelos et al. – arxiv.org/abs/2507.18905
* AMIE (Brodeur et al.) – arxiv.org/abs/2603.08448

3 weeks ago 1 0 1 0
Post image Post image

Aaron Eidt and I are currently presenting our #EACL2026 System Demo paper ELIA on simplifying mechanistic interpretability outcomes using vision-language models.
Come see us in the poster hall!

Paper: aclanthology.org/2026.eacl-de...

3 weeks ago 4 0 0 0
Post image Post image

Johann Frei and I are now presenting our Infherno #EACL2026 System Demonstration paper. Come to the poster hall to learn about our agentic approach to medical information extraction!

Paper here: aclanthology.org/2026.eacl-de...

3 weeks ago 4 0 0 0
Post image Post image

We find that it can help LLMs perform better on hate speech detection, but responses do not reflect real human demographics.

Models are surprisingly resistant against surface-level steering and frequently overflag content as harmful.

Oral will be in Session 2, Mar 25 12:00 PM @ Pavilion De Rabat.

1 month ago 2 0 0 0
Post image Post image

Can Persona Prompting function as a lens on social reasoning?

In our #EACL2026 work (led by @jingyng.bsky.social), we investigate how it impacts the quality of model outputs and rationales.

🗞️ arXiv: arxiv.org/abs/2601.20757

Come and find us (Jing, Moritz, Elisabeth, myself) in 🇲🇦 Rabat next week!

1 month ago 10 2 1 0
Post image

BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October!

This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️

Stay tuned for more details!

1 month ago 16 7 1 2
Video

🔥Super excited to share our new demo website for 🪄Interpreto!

🖼️It is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.

🎮Play with it: for-sight-ai.github.io/interpreto-d...

We will keep improving it, so stay tuned!

1 month ago 9 3 1 0

One to go! Thanks to everyone who agreed to review so far! 🫶
If you have the capacity for one emergency review on explainability of NLP models, please reach out via DMs/chat or by replying here. #ACL2026NLP
bsky.app/profile/nfel...

2 months ago 0 2 0 0
Advertisement

Hello #NLProc #ACL2026NLP community, I'm looking for an emergency reviewer for an ARR submission on LLM interpretability.

If you're available to complete a review before Feb 15, please reply or DM 🙏

2 months ago 2 6 0 0

Hello #NLProc #ACL2026NLP people. I am looking for **two emergency reviewers** in the Safety and Alignment in LLMs track for ACL/ARR.

Reviews are due Feb 15th. Please DM if interested and available.

Happy to offer drinks/food if you live in/pass by Lisbon ☀️

2 months ago 6 10 0 0

I'm looking for two emergency reviewers 🧑‍🚒👩‍🚒 for the ARR January Generalizability and Transfer track.

Please reach out if you have time & qualify for review or RT for visibility🙏🙏

2 months ago 2 6 0 0

Seems to be a common situation for ACs this round, but I'm also looking for two emergency reviewers for the January #ARR Evaluation and Resources track. I'd appreciate any help (reposts, encouragement, black magic...)

2 months ago 3 6 0 0

I am looking for 2 emergency reviewers for the ARR Ethics, Bias & Fairness track. Please DM me if you are available 🙏

2 months ago 6 6 0 0

Thanks a lot! Sent you a DM!

2 months ago 1 0 0 0

Looking for emergency reviewers for ARR Special Track "Explainability of NLP Models". Topics: Faithfulness, mechanistic interpretability, surveys and position papers. Deadline Feb 14 AoE. #ACL2026NLP

2 months ago 8 7 1 1

It was a real pleasure to visit the Health NLP Lab in Tübingen and present my research at BIFOLD and TU Berlin in collaboration with Charité and University of Augsburg among others. We had some exciting discussions. Thanks for having me!

3 months ago 10 0 0 0
Advertisement
Post image

Last week, Dr. Nils Feldhus @nfel.bsky.social, postdoctoral researcher at @tuberlin.bsky.social and @bifold.berlin, visited our lab and presented his research during our weekly lab meeting.

3 months ago 10 4 1 1
Post image

Sharing my favorite papers I read in 2025 from human-centric XAI, mechanistic interpretability, NLG evaluation, and related fields, covering conferences I've attended (ACL in Austria, EMNLP in China), but also journals, ML and HCI conferences:

nfelnlp.github.io/recommended/...

3 months ago 9 1 1 0
Post image

I’m at #NeurIPS in San Diego this week! Come see our poster on feature interpretability. Find @eberleoliver.bsky.social and me at:

🪧Poster Session 1 @ Exhibit Hall C,D,E #1015
Wed 3 Dec, 11 am - 2 pm
🪧Poster @ Mech Interp Workshop
Upper Level Room 30A-E
Sun 7 Dec, 8 am - 5 pm

4 months ago 11 3 1 0

*Urgently* looking for emergency reviewers for the ARR October Interpretability track 🙏🙏

ReSkies much appreciated

5 months ago 1 9 1 0

Heading to the EMNLP BlackboxNLP Workshop this Sunday? Don’t miss @nfel.bsky.social and @lkopf.bsky.social poster on „Interpreting Language Models Through Concept Descriptions: A Survey“
aclanthology.org/2025.blackbo...

#EMNLP #BlackboxNLP #XAI #Interpretapility

5 months ago 11 3 0 0
Preview
Human and LLM-based Assessment of Teaching Acts in Expert-led Explanatory Dialogues Aliki Anagnostopoulou, Nils Feldhus, Yi-Sheng Hsu, Milad Alshomary, Henning Wachsmuth, Daniel Sonntag. Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Le...

Nov 9, CODI-CRAC Workshop, 14:20-15:30 @ Hall C – Human and LLM-based Assessment of Teaching Acts in Expert-led Explanatory Dialogues (Anagnostopoulou et al.)

🗞️ aclanthology.org/2025.codi-1....

5 months ago 1 0 0 0

Nov 9, @blackboxnlp.bsky.social , 11:00-12:00 @ Hall C – Interpreting Language Models Through Concept Descriptions: A Survey (Feldhus & Kopf) @lkopf.bsky.social

🗞️ aclanthology.org/2025.blackbo...

bsky.app/profile/nfel...

5 months ago 4 2 1 1
Post image Post image Post image Post image

I'm at #EMNLP2025 in Suzhou🇨🇳 to present these papers in the coming days:

Nov 7, Session 14, 12:30-13:30 @ Hall C – Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems (Wang et al.) @qiaw99.bsky.social

🗞️ aclanthology.org/2025.finding...

5 months ago 9 1 1 0

🙏 Many thanks to the institutions that supported this research:
@tuberlin.bsky.social
@bifold.berlin

Looking forward to presenting this in 🇨🇳 Suzhou early November!

6 months ago 3 0 0 0
Advertisement
Concept description evaluation techniques categorized by metric, study, and the underlying quality being measured. Metrics are grouped into conceptual families: predictive simulation, input-based evaluation, output-based evaluation, semantic similarity, and human judgment.

Concept description evaluation techniques categorized by metric, study, and the underlying quality being measured. Metrics are grouped into conceptual families: predictive simulation, input-based evaluation, output-based evaluation, semantic similarity, and human judgment.

Our synthesis reveals a growing demand for more rigorous, causal evaluation. By outlining the state of the art and identifying key challenges, this survey provides a roadmap for future research toward making models more transparent.

This survey has been accepted at @blackboxnlp.bsky.social at EMNLP

6 months ago 5 0 1 0

We consider concept descriptions in open-vocabulary settings, the evolving landscape of automated and human metrics for evaluating them, and the datasets that underpin this research.

This is a companion paper to our PRISM paper that was accepted at NeurIPS last week: bsky.app/profile/lkop...

6 months ago 3 0 1 0
Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

🔍 Are you curious about uncovering the underlying mechanisms and identifying the roles of model components (neurons, …) and abstractions (SAEs, …)?

We provide the first survey of concept description generation and evaluation methods.

Joint effort w/ @lkopf.bsky.social

📄 arxiv.org/abs/2510.01048

6 months ago 18 4 1 1
Video

Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉

In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features.

📄 Paper: arxiv.org/abs/2506.15538

#NeurIPS #MechInterp #XAI

7 months ago 30 4 1 3

The submission deadline of the inaugural Young Researchers workshop at INLG 2025 has been extended by 5 days.
We're excited to receive your 2p position papers showcasing your NLG-related research until August 31, 2025! @siggen.bsky.social

ynlg-workshop.github.io

bsky.app/profile/nfel...

7 months ago 1 0 0 0