DeepRefusal Enhances LLM Safety via Probabilistic Refusal Ablation
DeepRefusal, a new fine‑tuning framework that probabilistically ablates refusal direction, cut jailbreak attack success rates by ~95% and was accepted for EMNLP 2025. Read more: getnews.me/deeprefusal-enhances-llm... #deeprefusal #llmsafety #emnlp2025
0
0
0
0