Advertisement · 728 × 90
#
Hashtag
#deeprefusal
Advertisement · 728 × 90
DeepRefusal Enhances LLM Safety via Probabilistic Refusal Ablation

DeepRefusal Enhances LLM Safety via Probabilistic Refusal Ablation

DeepRefusal, a new fine‑tuning framework that probabilistically ablates refusal direction, cut jailbreak attack success rates by ~95% and was accepted for EMNLP 2025. Read more: getnews.me/deeprefusal-enhances-llm... #deeprefusal #llmsafety #emnlp2025

0 0 0 0