Norman Mu (@normanmu.com) Bsky

A Closer Look at System Prompt Robustness System prompts have emerged as a critical control surface for specifying the behavior of LLMs in chat and agent settings. Developers depend on system prompts to specify important context, output forma...

Lots more in our paper (arxiv.org/abs/2502.12197) and code (github.com/normster/Rea...)

1 year ago 0 0 0 0

Reasoning (o3-mini and R1) seems highly effective for more retrieval-bottlenecked prompts (i.e. forgetting relevant guardrails), less so for adversarial inputs/prompt injections. Definitely an exciting direction to explore further.

1 year ago 0 0 1 0

Standard training techniques like good data curation, SFT -> DPO, work reasonably well, and the pass/fail nature of guardrail adherence enables the use of tricks like classifier-free guidance/contrastive decoding to further improve performance

1 year ago 0 0 1 0

RealGuardrails is our new dataset to 1) evaluate system prompt robustness on realistic prompts scraped from the ChatGPT store, and 2) evaluate methods for improving open weight models like Llama 3

1 year ago 1 0 1 0

System prompts are a critical control plane for LLMs/AI agents, but models vary widely in their robustness. We found a "complexity wall" at ~10 guardrails where prompt adherence declines rapidly even on totally benign inputs

1 year ago 1 0 1 0

I learned about how legal journals work earlier this year and have wondered if it could work for ML/AI: reviewing becomes a way for students to distinguish themselves rather than a chore

1 year ago 0 0 0 0

mindboggling that bytedance is 1) suing the author for damages and sabotage and 2) keeping their name on the paper/award without retracting it

1 year ago 0 0 0 0

Ethical Challenges Related to the NeurIPS 2024 Best Paper Award

absolutely wild allegations being shared about the author of the NeurIPS 2024 best paper winner: var-integrity-report.github.io.

really feels like the bottom is dropping out from academic ML research with the amount of brazen dishonesty

1 year ago 1 0 1 0

Going off FLOP/s and power, looks like these are very roughly 3/4 of an H100?

1 year ago 0 0 0 0

Amazon’s AI Self Sufficiency | Trainium2 Architecture & Networking Amazon is currently conducting one of the largest build-out of AI clusters globally, deploying a considerable number of Hopper and Blackwell GPUs. In addition to a massive Capex invested into Nvidi…

"AWS is currently deploying a cluster with 200k+ Trainium2 chips for Anthropic called “Project Rainer”" semianalysis.com/2024/12/03/a...

1 year ago 2 0 1 0

top loss curves do look sketchy but idk how bad this is for an RL task. bottom curves don't obviously asymptote but they also shouldn't with learning rate decay (which the open source release seems to use? github.com/google-resea...)

1 year ago 0 0 0 0

my takeaway from Vighnesh's writeup: google's main argument about under-training/lack of pre-training isn't super convincing. convergence of train loss is not always desirable. early stopping can help, and their own paper shows mixed results on importance of pre-training

1 year ago 0 0 1 0

Reinforcement Learning for Chip Floorplanning: The Saga

i've been confusedly semi-following the RL chip design/AlphaChip story for a few years trying to understand what the core disagreement was, and finally found a summary that (seems) to collate and explain all the relevant artifacts

1 year ago 3 0 1 0

I’d like to join!

2 years ago 1 0 0 0

Posts by Norman Mu