Advertisement · 728 × 90

Posts by NDIF Team

Post image

NNsight is part of a growing open-source ecosystem. We're building the infrastructure so you can focus on the science.

Upgrade to NNsight 0.6 today: pip install nnsight --upgrade

nnsight.net
github.com/ndif-team/nnsight
discuss.ndif.us

1 month ago 3 0 0 0
Introducing NNsight 0.6 - nnsight Documentation for the nnsight Python library

Read our blog post to learn more about the design of this release and its features: nnsight.net/blog/2026/02/26/introducing-nnsight-06/

1 month ago 2 0 1 0
Preview
GitHub - ndif-team/skills: Teach LLMs to use NNSight with Skills Teach LLMs to use NNSight with Skills. Contribute to ndif-team/skills development by creating an account on GitHub.

We also ship first-class support for AI coding agents, including skills for Claude Code and Codex, Context7 MCP server for live docs, and comprehensive guides in the repo.

github.com/ndif-team/skills

1 month ago 4 0 2 0
Post image

Other additions include:
- Clean error tracebacks that point to YOUR code, not NNsight internals
- Check NDIF before submitting jobs with ndif.status()
- Standard for step-in tracer.iter[:] generation loops (faster than with blocks!)

1 month ago 3 0 1 0
Post image

0.6 also comes with 2.4–3.9x performance improvements.

- Empty trace: 1196μs → 308μs
- 12 .save() calls: 1697μs → 716μs.

The big wins: always-on trace caching, persistent pymount, and batched variable sync. Setup cost dropped from ~1,100μs to ~210μs

1 month ago 3 0 1 0
Post image

NNsight 0.6 also introduces first-class support for VisionLanguageModel (e.g., LLaVA, Qwen-VL) and DiffusionModel (e.g., Stable Diffusion, Flux)! Available remote on NDIF soon 👀

1 month ago 3 0 1 0
Post image

vLLM integration got a major upgrade, now supporting single-GPU, multi-GPU tensor parallelism, Ray distributed execution, and even multi-node experiments, all using the same tracing API. NNsight handles the tensor gather/scatter, allowing you to intervene on unsharded tensors.

1 month ago 3 0 1 0
Post image

Also new is async mode with real-time token streaming. Build interactive apps like chat interfaces, or live visualizations with interventions running on every forward pass:

1 month ago 3 0 1 0
Advertisement
Post image

This is huge for the ecosystem. Libraries like NNterp can ship new features without waiting for NDIF to update. You always run whatever version you have locally.

1 month ago 3 0 1 0

Our #1 request: "I want to run my own analysis code on NDIF, not just inline interventions." Now, with NNsight source code serialization, you can! Your local packages work, even if NDIF doesn't have them installed.

1 month ago 3 0 1 0
Video

NNsight 0.6 is out now! We directly address your feedback in our biggest release yet. Pain points included cryptic errors, slow traces, no remote execution of custom code, and limited vLLM support. We tackle all of these and more in this new release.

🧵 Here's what changed:

1 month ago 7 2 1 1
Preview
Auditing language models for hidden objectives We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training...

For more details, read the paper: arxiv.org/abs/2503.10965

2 months ago 3 0 0 0

Red teams trained a model with a secret objective by exploiting RLHF reward models. Blue teams then audited the model, using techniques such as interpretability with sparse autoencoders, behavioral attacks, and training data analysis to successfully uncover the hidden objective.

2 months ago 2 0 1 0
Auditing Language Models for Hidden Objectives with Sam Marks
Auditing Language Models for Hidden Objectives with Sam Marks Sam Marks leads Anthropic's Cognitive Oversight team, a subteam of Alignment Science. Sam's research focuses on settings where understanding something about ...

Watch Sam Marks present his work "Auditing Language Models for Hidden Objectives" in our new YouTube video! Sam's team ran a blind auditing game to assess efficacy of black box and white box techniques for LLM alignment auditing.

🔗 youtu.be/jZiOJTHqB6M

2 months ago 3 0 1 0

Big thanks to the whole organizing team, especially @neelnanda.bsky.social and
Andy Arditi, for hosting such a great workshop and inviting us to speak!

2 months ago 2 0 0 0
Preview
NSF National Deep Inference Fabric NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.

Adam Belfki discusses NDIF and Workbench (youtu.be/zmHyaHiw8XU)

- workbench.ndif.us/
- ndif.us/
- nnsight.net/

2 months ago 2 0 1 0
Advertisement
Preview
The Dual-Route Model of Induction Do LLMs copy meaningful text by rote or by understanding meaning? Webpage for The Dual-Route Model of Induction (Feucht et al., 2025).

@sfeucht.bsky.social presents their work on concept induction heads (youtu.be/Jc-sTXW31W0)

- dualroute.baulab.info/

2 months ago 2 0 1 0
"In Defense of Curiosity" with David Bau (NeurIPS 2025 Mech Interp Workshop)
"In Defense of Curiosity" with David Bau (NeurIPS 2025 Mech Interp Workshop) David Bau presents his thoughts on "Pragmatic Interpretability" by recounting the history of Venetian glassmakers at the NeurIPS 2025 Mechanistic Interpretab...

@davidbau.bsky.social shares his thoughts on pragmatic interpretability (youtu.be/iMIsg32mVHM)

- davidbau.com/archives/20...
- In response to: www.alignmentforum.org/posts/StENz...

2 months ago 2 0 1 0
Preview
NeurIPS 2025 Mechanistic Interpretability Workshop Talks from NDIF and Bau Lab at the NeurIPS 2025 Mech Interp Workshop

Check out the NDIF & Bau Lab lightning talks at the NeurIPS 2025 Mechanistic Interpretability Workshop (mechinterpworkshop.com/): youtube.com/playlist?li...

2 months ago 4 1 1 0
Annus Mirabilis: A Year of Explosive Progress in LLMs with Benjamin Feuer
Annus Mirabilis: A Year of Explosive Progress in LLMs with Benjamin Feuer YouTube video by NDIF Team

New YouTube video posted! @benjaminfeuer.bsky.social discusses LLM's annus mirabilis, presenting his work on open questions surrounding LLM judges, benchmark trustworthiness, and maximizing the potential of synthetic data.

Watch here: www.youtube.com/watch?v=pehc...

2 months ago 2 0 0 0
Post image

🔥I am super excited for the official release of an open-source library we've been working on for about a year!

🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)

3 months ago 20 9 1 3
Deepti Ghadiyaram [Email] [Google Scholar] [Twitter] [LinkedIn] I am an Assistant Professor at Boston University in the Department of Computer Science . I am also an Affiliated Faculty with the Department of Electrical and Computer Engineering and Faculty of Computing & Data Sciences and an academic collaborator with Runway. My research interests...

📄 Paper: arxiv.org/abs/2411.16725

💻 Code & Visualizations: github.com/revelio-dif......

🌐 Deepti's Website: deeptigp.github.io/

3 months ago 2 0 0 0
Interpreting and Leveraging Diffusion Representations with Deepti Ghadiyaram
Interpreting and Leveraging Diffusion Representations with Deepti Ghadiyaram Deepti Ghadiyaram is an Assistant Professor at Boston University in the Department of Computer Science, with affiliated appointments in Electrical and Comput...

New year, new YouTube videos! We are resuming our regular interpretability seminar posts, with a fantastic talk by Deepti Ghadiyaram on interpreting diffusion models.

Watch the video: youtu.be/4eqvABPX5rA

3 months ago 5 3 1 0

So excited to have you on the team, @gsarti.com!

3 months ago 4 0 0 0
Preview
GitHub - ndif-team/nnterp: Unified access to Large Language Model modules using NNsight Unified access to Large Language Model modules using NNsight - ndif-team/nnterp

Report issues or contribute to the open-source project: github.com/ndif-team/n...

3 months ago 1 0 0 0
Advertisement

Add support for new models (or custom ones): ndif-team.github.io/nnterp/addi...

3 months ago 0 0 1 0

Try out built-in interventions like logit lens and patchscope: ndif-team.github.io/nnterp/inte...

3 months ago 0 0 1 0

nnterp by @butanium.bsky.social is now part of the NDIF ecosystem! nnterp standardizes transformer naming conventions, includes built-in best practices for common interventions, and is perfectly compatible with original HF model implementations.

Learn more: ndif-team.github.io/nnterp/

3 months ago 5 2 1 1
Preview
NNsight 0.5.13 Release: vLLM integration and performance improvements Excited to announce our new NNsight version, nnsight v0.5.13! This release re-integrates support vLLM into NNsight, along with introducing performance improvements. To learn more, check out the release notes below and the vLLM tutorial. Please use this thread to provide feedback on vLLM integration and any other issues concerning this release. Release Notes: 1. nnsight support for vLLM inference has been complexly refactored and works with the latest version of vLLM, including tensor paral...

Submit feedback: discuss.ndif.us/t/nnsight-0...

4 months ago 0 0 0 0
Preview
Release v0.5.13 · ndif-team/nnsight Release Notes: v0.5.13 1. nnsight support for vLLM inference has been complexly refactored and works with the latest version of vLLM, including tensor parallelism. Enabling fast inference on multi...

Release notes: github.com/ndif-team/n...

4 months ago 0 0 1 0