This work was produced in collaboration with @jplhughes @sprice354_ @rylanschaeffer.bsky.social @FazlBarez @sanmikoyejo @sleight_henry @erikjones313 @EthanJPerez @MrinankSharma
1 year ago
1
0
0
0
This work was produced in collaboration with @jplhughes @sprice354_ @rylanschaeffer.bsky.social @FazlBarez @sanmikoyejo @sleight_henry @erikjones313 @EthanJPerez @MrinankSharma
Paper: arxiv.org/abs/2412.03556
Code: github.com/jplhughes/bo...
Example jailbreaks and more: jplhughes.github.io/bon-jailbrea...
NEW PAPER: Best-of-N Jailbreaking.
We modify LLM inputs with simple, randomly generated augmentations and jailbreak frontier models across text, vision, and audio modalities.
The algorithm is simple, scalable and highly effective.