Navita Goyal (@navitagoyal) Bsky

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions Navita Goyal, Hal Daumé Iii. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.

while steering methods effectively control target behavior, they substantially increase LLMs’ vulnerability to jailbreaks, revealing a failure of robust specificity. If you’re at EACL, stop by my poster at 9AM today to hear more.

Here's a link to the full paper: aclanthology.org/2026.eacl-lo...

3 weeks ago 1 0 0 0

In this work, we argue that evaluating efficacy alone isn’t enough. Steering has two sides — efficacy and specificity — yet current evaluations predominantly focus on the former. We introduce a three-part framework for specificity (general, control, robustness) and show that...

3 weeks ago 0 0 1 0

Thanks WiAIR (@wiair.bsky.social‬) for featuring my work on your YouTube channel. Watch the video to hear about our work on inference-time steering — and why these interventions LLMs may not be as “precise” as they look.

3 weeks ago 2 1 1 0

This call is still open. I am looking to recruit, as well as many other faculty at Cornell. We review folders as they come, and will send offers until all positions are filled.

Please share with your network 🙏

2 months ago 11 8 0 0

I have heard ~5 onsite per open position and ~10 phone interviews per onsite invite

3 months ago 1 0 0 0

What cognitive science can learn from AI #3 in a series on cognitive science and AI

What can cognitive science learn from AI? In infinitefaculty.substack.com/p/what-cogni... I outline how AI has found that scale and richness of learning experiences fundamentally change learning & generalization — and how I believe we should rethink cognitive experiments & theories in response.

3 months ago 36 14 1 1

Nihar B. Shah - CMU Nihar B. Shah, Associate Professor in MLD and CSD at CMU.

arxiv.org/abs/2403.01015 comes to mind. Generally, Nihar's lab has a lot of amazing work in this space that should be relevant to your search

4 months ago 1 0 0 0

Woah, this is so cool! How was I not aware of this. I just set mine up to prepare for NeurIPS and I am loving it already... it made thousands of accepted paper so much more tractable to navigate

4 months ago 3 0 0 0

Senior Tenure Track Faculty at the Artificial Intelligence Interdisciplinary Institute at Maryland (AIM) - Associate Professor/Professor (Open Rank Joint Appointment) Job Description Summary Organization Summary Statement: The Artificial Intelligence Interdisciplinary Institute at Maryland - AIM (aim.umd.edu) - is hiring 40 faculty over the next several years, incl...

AIM's 2nd round of TTK hiring - building up to 30 - is up!

📅 Ddl 12/22/25
🔬 Accessibility & Learning, plus Sustainability & Social Justice
🧑‍🏫 Associate/Full Prof*
🔗 umd.wd1.myworkdayjobs.com/en-US/UMCP/j...

*Assistant-level candidates: apply to departments, mentioning AIM in a cover letter

5 months ago 11 9 0 0

My lab at BU is recruiting PhD students and possibly a postdoc this year!

We study humans & machines, centered around topics like meaning, generalization, evaluation methods and design, and the nature of computation and representation that underlie language and cognition.

🫴🫴

5 months ago 13 3 1 1

Interested in interpretability, data attribution, evaluation, and similar topics?
Interested in doing a postdoc with me?
Apply to the prestigious Azrieli program!
Link below 👇

DMs are open (email is good too!)

5 months ago 5 1 1 0

The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure — Tuesday at 11:00, Poster Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification — Tuesday at 14:30, Demo Measuring Scalar Constructs in Social Science with LLMs — Friday at 10:30, Oral at CSS How Persuasive is Your Context? — Friday at 14:00, Poster

Happy to be at #EMNLP2025! Please say hello and come see our lovely work

5 months ago 8 1 0 0

I am recruiting PhD students to start in 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.

6 months ago 57 18 1 1

This is a great use case of linear erasure! It's always exciting to see interesting applications of these techniques :)

6 months ago 1 0 1 0

Congrats! 🎉 Very excited to follow your lab's work

8 months ago 1 0 1 0

Congratulations and welcome to Maryland!! 🎉

10 months ago 1 0 1 0

I'll be presenting this work with @rachelrudinger at #NAACL2025 tomorrow (Wednesday 4/30) in Albuquerque during Session C (Oral/Poster 2) at 2pm! 🔬

Decomposing hypotheses in traditional NLI and defeasible NLI helps us measure various forms of consistency of LLMs. Come join us!

11 months ago 8 3 5 1

What does it mean for #LLM output to be novel?
In work w/ johnchen6.bsky.social, Jane Pan, Valerie Chen and He He, we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

11 months ago 7 4 2 0

This option is available on the menu (three dots) next to the comment/repost/like section. I only see this when I am in the Discover feed though; not on my regular feed

11 months ago 2 0 0 0

🚨 New Paper 🚨

1/ We often assume that well-written text is easier to translate ✏️

But can #LLMs automatically rewrite inputs to improve machine translation? 🌍

Here’s what we found 🧵

1 year ago 8 4 1 0

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs Large language models (LLMs) pre-trained predominantly on English text exhibit surprising multilingual capabilities, yet the mechanisms driving cross-lingual generalization remain poorly understood. T...

🔈 NEW PAPER 🔈
Excited to share my paper that analyzes the effect of cross-lingual alignment on multilingual performance
Paper: arxiv.org/abs/2504.09378 🧵

1 year ago 0 2 1 0

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9

1 year ago 20 10 0 0

Thinking about paying $20k/month for a "PhD-level AI agent"? You might want to wait until their web browsing skills are on par with those of human PhD students 😛 Check out our new BEARCUBS benchmark, which shows web agents struggle to perform simple multimodal browsing tasks!

1 year ago 6 1 0 0

🚨 Our team at UMD is looking for participants to study how #LLM agent plans can help you answer complex questions

💰 $1 per question
🏆 Top-3 fastest + most accurate win $50
⏳ Questions take ~3 min => $20/hr+

Click here to sign up (please join, reposts appreciated 🙏): preferences.umiacs.umd.edu

1 year ago 2 3 0 0

This is called going above and beyond for job assigned to you.

1 year ago 2 0 1 0

Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the ...

Our paper on studying over-reliance in claim verification with the help of an LLM assistance arxiv.org/abs/2310.12558

Re mitigation: we find that showing users contrastive explanations—reasoning both why a claim may be true and why it may be false—helps counter over-reliance to some extent.

1 year ago 4 0 0 0

🚨 New Position Paper 🚨

Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠

Here's why MCQA evals are broken, and how to fix them 🧵

1 year ago 46 13 2 0

How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at *solving* such tasks, let alone generating data for them. Check out our paper for a compression-based solution!

1 year ago 17 4 0 0

This paper is really cool. They decompose NLI (and defeasible NLI) hypotheses into atoms, and then use these atoms to measure the logical consistency of LLMs.

E.g. for an entailment NLI example, each hypothesis atom should also be entailed by the premise.

Very nice idea 👏👏

1 year ago 15 3 2 0

Logo for TRAILS depicting a variety of sociotechnical settings in which AI is used.

Please join us for:
AI at Work: Building and Evaluating Trust

Presented by our Trustworthy AI in Law & Society (TRIALS) institute.

Feb 3-4
Washington DC

Open to all!

Details and registration at: trails.gwu.edu/trailscon-2025
Sponsorship details at: trails.gwu.edu/media/556

1 year ago 16 7 0 0

Posts by Navita Goyal