Active Attacks: Adaptive Red‑Team RL for LLM Safety
Active Attacks, an adaptive RL framework for LLM safety testing, boosted cross‑attack success from 0.07% to 31.28% (over 400× gain) while adding ~6% compute. The study was posted Sep 26 2025. getnews.me/active-attacks-adaptive-... #llmsafety #activetests
0
0
0
0