Advertisement · 728 × 90

Posts by Aengus Lynch

This work was produced in collaboration with @jplhughes @sprice354_ @rylanschaeffer.bsky.social @FazlBarez @sanmikoyejo @sleight_henry @erikjones313 @EthanJPerez @MrinankSharma

1 year ago 1 0 0 0
Preview
Best-of-N Jailbreaking We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities. BoN Jailbreaking works by repeatedly sampling variations of a prompt with...


Paper: arxiv.org/abs/2412.03556
Code: github.com/jplhughes/bo...
Example jailbreaks and more: jplhughes.github.io/bon-jailbrea...

1 year ago 3 0 1 0
Post image

NEW PAPER: Best-of-N Jailbreaking.

We modify LLM inputs with simple, randomly generated augmentations and jailbreak frontier models across text, vision, and audio modalities.

The algorithm is simple, scalable and highly effective.

1 year ago 5 1 1 0