Sarah Wiegreffe (@sarah-nlp) Bsky

Our ICML 2025 workshop on Actionable Interpretability drew massive interest. But the same questions kept coming up: What does "actionable" mean? Is it achievable? How?
We're ready to answer.
🧵

1 month ago 23 10 1 1

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to o...

Also a good one by @nsaphra.bsky.social
arxiv.org/abs/2311.05020

1 month ago 4 1 0 0

🥹🥰

1 month ago 2 0 0 0

TRAILS UMD Post Doctoral Associate Job Description - Spring 2026 Post Doctoral Associate Institute for Trustworthy AI in Law & Society February 2026 The Institute for Trustworthy AI in Law & Society (TRAILS) and the University of Maryland aim to transform the pr...

Come join TRAILS as a postdoc at UMD (and work w folks at GW, MSU & Cornell) to conduct research and scholarship focused on approaches to AI that advance trust and trustworthiness with a great group of colleagues!

🌐 go.umd.edu/trails-postd...
🗓️ Summer/Fall 2026 start

1 month ago 4 6 0 1

If you're at #ICML2025, chat with me, @sarah-nlp.bsky.social, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

9 months ago 13 3 0 0

I am also recruiting PhD students @univofmaryland.bsky.social for fall 2026 with interests in (causal/mechanistic) LM interpretability and its practical applications (steering, efficient adaptation, model editing, textual explanations for users, etc.).

9 months ago 2 0 0 0

I am at #ICML2025! 🇨🇦🏞️
Catch me:

1️⃣ Presenting this paper👇 tomorrow 11am-1:30pm at East #1205

2️⃣ At the Actionable Interpretability @actinterp.bsky.social workshop on Saturday in East Ballroom A (I’m an organizer!)

9 months ago 3 1 1 0

This week is #ICML in Vancouver, and a number of our researchers are participating. Here's the full list of Ai2's conference engagements—we look forward to connecting with fellow attendees. 👋

9 months ago 3 2 0 0

Thank you! Look forward to being colleagues.

9 months ago 0 0 0 0

Thank you!

9 months ago 0 0 0 0

Thank you!

9 months ago 0 0 0 0

Thanks :))

9 months ago 0 0 0 0

Thanks so much for all your support ☺️🥰

10 months ago 1 0 0 0

Thank you!

10 months ago 0 0 0 0

Thank you 😄

10 months ago 0 0 0 0

☺️ come visit!

10 months ago 1 0 0 0

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

10 months ago 65 3 13 1

Congrats Kristina! 😍

10 months ago 1 0 1 0

An image with the Vancouver skyline and the words "sign up to review". At the top are the logos of both the Actionable Interpretability workshop (a magnifying glass) and the ICML conference (a brain).

🚨 We're looking for more reviewers for the workshop!
📆 Review period: May 24-June 7

If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.

💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...

11 months ago 6 5 0 0

🤖: "Great review, but it could be improved by doing [exact thing I wrote in subsequent sentences]"

11 months ago 3 0 0 0

Where is version control and shared editing for keynote files?! 🤦‍♀️

11 months ago 2 0 0 0

We are quite excited about the leaderboard and release, and are open to feedback to help this remain a living benchmark.

11 months ago 1 0 0 0

Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in).

@amuuueller.bsky.social @boknilev.bsky.social and other co-authors are around #ICLR2025 if you want to find out more. 😊

11 months ago 8 0 1 0

See Yanai's thread for more info:
bsky.app/profile/yana...

11 months ago 0 0 0 0

2) On the connection between linear relational embeddings in LMs and frequency of relations in pretraining data
- Led by @jackmerullo.bsky.social w/ @nlpnoah.bsky.social @yanai.bsky.social
- arxiv.org/abs/2504.12459
- Yanai is presenting the poster tomorrow 04/26 10am-12:30pm (Hall 3+Hall 2B #236)!

11 months ago 2 1 1 1

I'm not at #ICLR2025, but have 2 works being presented:

1) Understanding how LMs answer multiple-choice questions
- arxiv.org/abs/2407.15018
- @boknilev.bsky.social is presenting the poster *now* until 12:30 (Hall 3+Hall 2B #207)
- & w/ @oyvind-t.bsky.social @hanna-nlp.bsky.social Ashish Sabharwal

11 months ago 6 1 1 1

I'm in Singapore for ICLR to present this paper:
Tomorrow, April 26th, 10-12:30 in Hall 3+2B #236
Come check it out!

arxiv.org/abs/2504.12459

11 months ago 3 2 0 0

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social

11 months ago 42 12 3 3

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9

1 year ago 20 10 0 0

📢 Open PhD Position in Interpretable Natural Language Processing at the Department of Computer Science, UCPH!

🗓 Application deadline is 15 January 2025.

Find more information about the position and apply here 👉 di.ku.dk/english/abou...

@apepa.bsky.social @iaugenstein.bsky.social

1 year ago 9 2 0 0

Posts by Sarah Wiegreffe