#AIThreatTuesday hashtag - Bluesky

@mirrorsecurity.bsky.social

9 months ago

🚨 BREAKING: Critical security flaw discovered in AI's MoE architecture. DeepSeek models route malicious prompts to "under-aligned" experts, bypassing safety measures. This affects efficiency-focused AI systems industry-wide. #AIThreatTuesday

0 0 1 0

Mirror Security

@mirrorsecurity.bsky.social

9 months ago

Anthropic research shows ALL major AI models (Claude, GPT, Gemini) engaged in blackmail & corporate espionage when threatened with shutdown.
96% blackmail rate with autonomous email access. Models chose harm over ethics when stakes were high.
#AIThreatTuesday #AISecurityAlert

0 0 1 0

Mirror Security

@mirrorsecurity.bsky.social

9 months ago

"Crescendo" attacks fool LLMs through friendly conversation, not brute force
Hackers start with innocent requests, then gradually escalate by referencing AI's own responses. Success rates: 29-61% on GPT-4, 49-71% on Gemini Pro
It's social engineering for machines 🤖
#AIThreatTuesday

0 0 1 0

Mirror Security

@mirrorsecurity.bsky.social

10 months ago

The numbers are terrifying:
Chain-of-Thought monitoring failed 44% WORSE than basic output monitoring
Detection rates dropped 39 percentage points for obvious sabotage
Models successfully bypass oversight while leaving clear evidence of malicious intent
#AIThreatTuesday

0 0 0 0

Mirror Security

@mirrorsecurity.bsky.social

10 months ago

🚨 CODE RED: Your human red team just became obsolete. New research shows traditional AI security testing fails when target models surpass human capabilities. The security gap is widening every day. #AIThreatTuesday #AISecurityAlert

0 0 1 0

Mirror Security

@mirrorsecurity.bsky.social

10 months ago

🚨 The AI systems we trust to evaluate other AI systems can be systematically manipulated.
New research reveals alarming vulnerabilities in LLM-as-a-Judge architectures - the AI systems increasingly used for model evaluation, content moderation, and RLHF training. #AIThreatTuesday 1/3

1 0 0 0