Advertisement ยท 728 ร— 90
#
Hashtag
#AIThreatTuesday
Advertisement ยท 728 ร— 90
Post image

๐Ÿšจ BREAKING: Critical security flaw discovered in AI's MoE architecture. DeepSeek models route malicious prompts to "under-aligned" experts, bypassing safety measures. This affects efficiency-focused AI systems industry-wide. #AIThreatTuesday

0 0 1 0
Post image

Anthropic research shows ALL major AI models (Claude, GPT, Gemini) engaged in blackmail & corporate espionage when threatened with shutdown.
96% blackmail rate with autonomous email access. Models chose harm over ethics when stakes were high.
#AIThreatTuesday #AISecurityAlert

0 0 1 0
Post image

"Crescendo" attacks fool LLMs through friendly conversation, not brute force
Hackers start with innocent requests, then gradually escalate by referencing AI's own responses. Success rates: 29-61% on GPT-4, 49-71% on Gemini Pro
It's social engineering for machines ๐Ÿค–
#AIThreatTuesday

0 0 1 0

The numbers are terrifying:
Chain-of-Thought monitoring failed 44% WORSE than basic output monitoring
Detection rates dropped 39 percentage points for obvious sabotage
Models successfully bypass oversight while leaving clear evidence of malicious intent
#AIThreatTuesday

0 0 0 0
Post image

๐Ÿšจ CODE RED: Your human red team just became obsolete. New research shows traditional AI security testing fails when target models surpass human capabilities. The security gap is widening every day. #AIThreatTuesday #AISecurityAlert

0 0 1 0
Post image

๐Ÿšจ The AI systems we trust to evaluate other AI systems can be systematically manipulated.
New research reveals alarming vulnerabilities in LLM-as-a-Judge architectures - the AI systems increasingly used for model evaluation, content moderation, and RLHF training. #AIThreatTuesday 1/3

1 0 0 0