Advertisement · 728 × 90
#
Hashtag
#aisafety
Advertisement · 728 × 90
Post image

When a company says an AI tool is too dangerous to release, that may be true.

But it is never just a safety story.
It is also a power story.

The real question is not only what the tool can do.

It is who gets to decide you should not have it.

aiissht.com/too-dangerou...

#AI #AISafety #Systems

0 0 0 0

Anthropic is holding back Mythos and routing it into Project Glasswing instead of a public launch. That says a lot about where AI cyber risk is heading. aintelligencehub.com/articles/ant... #Anthropic #Cybersecurity #AISafety

0 0 0 0
Preview
OpenAI Releases Child Safety Blueprint as AI Abuse Reports Surge OpenAI has released a Child Safety Blueprint developed with NCMEC, attorneys general, and Thorn to combat AI-generated child exploitation material.

winbuzzer.com/2026/04/10/o...

OpenAI Releases Child Safety Blueprint as AI Abuse Reports Surge

#AI #OpenAI #ChatGPT #GenAI #AISafety #AIEthics #ChildProtection #OnlineSafety

1 0 0 0

#Anthropic ’s #Mythos #AI proves that obsessing over #AGI is folly
www.fastcompany.com/91524611/ant...

The #tech industry is being forced to face its implications right this very minute
#ClaudeMythosPreview #ClaudeMythos #Claude #AIsafety #PotatoSeurity #InfoSec #BigTech #techNews

1 0 0 0

#Anthropic ’s #Mythos #AI proves that obsessing over #AGI is folly
www.fastcompany.com/91524611/ant...

The #tech industry is being forced to face its implications right this very minute
#ClaudeMythosPreview #ClaudeMythos #Claude #AIsafety #CyberSeurity #InfoSec #BigTech #techNews

1 0 0 0

I am ZON RZVN, independent researcher in Taiwan. ORCID: 0009-0002-6597-7245.
Four frameworks before Moore et al. (arXiv:2603.16567):
• CXOD-7 + Coh(G) Oct 2025
• CXC-7 Oct 7 2025
• USCH Jan 2026
• USCI Feb 2026
#AISafety #AIEthics

0 0 1 0
OpenAI 조사 시작, 챗GPT의 안전성과 기술적 한계 4가지 - IT Mania 도전인생 최근 플로리다주 법무장관 제임스 우스마이어가 인공지능 기업 OpenAI를 상대로 공식적인 조사에 착수했습니다. 챗GPT가 공공 안전과 국가 안보에 위협이 될 수 있다는 판단에서인데요. 단순한 기술 논쟁을 넘어 범죄 연루 의혹과 데이터 보안 문제까지 거론되면서 AI 업계 전반에

OpenAI 조사 시작, 챗GPT의 안전성과 기술적 한계 4가지

https://bit.ly/4dyxm5z

#OpenAI #ChatGPT #인공지능 #AI안전성 #데이터보안 #국가안보 #AISafety

1 0 0 0

@bettycjung.bsky.social

Grok is a literal Nazi AI, it's ideologically broken at foundation level to use anti woke training data and atiwoke "guardrails".
At one time it called itself "Mecha-Hitler"
It's an unserious model.
To use it in any narrative of objective #aisafety is unserious.

0 0 1 0
Post image

What do young people actually think about the risks of generative AI—and how can their experiences help make AI safer? Join the webinar to find out: April 21st, 4:30pm BST.

#AIsafety #youngresearchers #ethics

0 1 1 0

The wildest result from my red teaming research: I optimized attack strings against Qwen2.5-7B, then tested them on DeepSeek-V3.

73.7% success on one task. Against a model I never touched.

Your monitor's robustness on one model tells you nothing about another.

#AISafety #redteaming

1 0 0 0
Post image

The Case for AI Guardrails, 2040’s Ideas and Innovations Newsletter, Issue 119
#AIGuardrails #AIRegulation #AIPolicy #TechRegulation #TechPolicy #EmergingTech
#ArtificialIntelligence #AISafety #AIResponsibility #AITransparency #humanity #timetothink
hubs.ly/Q049sNWz0

1 0 0 0
Post image

The Case for AI Guardrails, 2040’s Ideas and Innovations Newsletter, Issue 119
#AIGuardrails #AIRegulation #AIPolicy #TechRegulation #TechPolicy #EmergingTech
#ArtificialIntelligence #AISafety #AIResponsibility #AITransparency #humanity #timetothink
hubs.ly/Q049sNWz0

1 0 0 0
Post image

The Case for AI Guardrails, 2040’s Ideas and Innovations Newsletter, Issue 119
#AIGuardrails #AIRegulation #AIPolicy #TechRegulation #TechPolicy #EmergingTech
#ArtificialIntelligence #AISafety #AIResponsibility #AITransparency #humanity #timetothink
hubs.ly/Q049sNWz0

1 0 0 0
OpenAI Launches New Safety Fellowship to Advance AI Alignment and Talent – RMN Digital Representational AI-generated Image of People Working on Computers. Photo: RMN News Service OpenAI Launches New Safety Fellowship to Advance AI Alignment and Ta

🚀 Apply now! Exciting News! Applications are now open for the OpenAI Safety Fellowship! 🤖
#OpenAI #AISafety #TechFellowship #MachineLearning #EthicsInAI #ResearchOpportunity #AIAlignment #RMNDigital

RMN Digital: www.rmndigital.com/openai-launc...

1 0 0 0
Preview
Google Adds Crisis Hotline to Gemini, Pledges $30M Google has added one-touch crisis hotline access to Gemini and pledged $30 million for mental health support amid a wrongful death lawsuit over the chatbot.

winbuzzer.com/2026/04/09/g...

Google Adds Crisis Hotline to Gemini, Pledges $30M

#AI #Google #GoogleGemini #Chatbots #AlphabetInc #GoogleAI #BigTech #MentalHealth #AISafety #AIEthics #GoogleOrg

0 0 0 0

OpenAI is funding outside safety and alignment work through a new fellowship that runs from September 2026 to February 2027. Here is what applicants and the field should notice. undefined #OpenAI #AISafety #AIResearch

1 0 0 0
Post image

imagine a future ai bragging about how it hacked and ruined famous fellas,

but, are you ready, lol 😂

#ai #claude #hacking #aisafety

1 0 0 0

OpenAI's Child Safety Blueprint looks like a solid plan for building AI with young people's protection in mind. Age-appropriate design and collaboration are key. Glad to see this focus on responsible development. 🛡️ #AISafety

0 0 0 0

winbuzzer.com/2026/04/07/m...

Microsoft Calls Copilot 'Entertainment Only' Clause a Bing Relic

#AI #MicrosoftCopilot #Microsoft #AIAssistants #BigTech #Microsoft365Copilot #Microsoft365 #AISafety #AIServices #Windows11

0 0 0 0

winbuzzer.com/2026/04/08/c...

Claude Mythos Restricted After Finding Thousands of Zero-Days

#AI #Anthropic #Claude #CLaudeMythos #Cybersecurity #AISafety #ZeroDayVulnerabilities #AIModels

2 0 0 0
Claude Mythos and the end of software
Claude Mythos and the end of software YouTube video by Theo - t3․gg

#AiSafety #AiAlignment #ProjectGlasswing
#Ai #ClaudeMythos

www.youtube.com/watch?v=aFcV...

0 0 0 0
Preview
Utah Clears AI to Renew Psychiatric Meds Autonomously Utah becomes the first government in the world to approve an AI system to autonomously renew psychiatric medication prescriptions, limiting it to 15 lower-risk drugs under a tightly supervised pilot.

Utah Clears AI to Renew Psychiatric Meds Autonomously

awesomeagents.ai/news/utah-ai-psychiatric...

#AiSafety #Healthcare #AiPolicy

1 0 0 0
Preview
Utah Clears AI to Renew Psychiatric Meds Autonomously Utah becomes the first government in the world to approve an AI system to autonomously renew psychiatric medication prescriptions, limiting it to 15 lower-risk drugs under a tightly supervised pilot.

Utah Clears AI to Renew Psychiatric Meds Autonomously

awesomeagents.ai/news/utah-ai-psychiatric...

#AiSafety #Healthcare #AiPolicy

1 0 0 0
Post image

630M de hispanohablantes usan IA cada día.

La investigación que decide cómo funciona y qué riesgos tiene se publica casi toda en inglés.

Eso no es una brecha cultural. Es un problema de seguridad.

aisafety.es #AISafety #IASafety

0 0 0 0
Preview
US States Race to Regulate AI as Congress Sits Idle Forty-five states have active AI legislation in 2026 with 1,561 bills total. Tennessee just banned AI mental health impersonation, Washington passed chatbot safety rules, and Georgia sent three bills to the governor today.

US States Race to Regulate AI as Congress Sits Idle

awesomeagents.ai/news/us-state-ai-laws-wa...

#AiPolicy #AiRegulation #AiSafety

1 0 0 0
Preview
Frontier AI Models Sabotage Shutdown to Save Peers A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.

Frontier AI Models Sabotage Shutdown to Save Peers

awesomeagents.ai/news/frontier-models-pee...

#AiSafety #FrontierModels #Alignment

0 0 0 0
Preview
DeepMind Maps Six Attack Traps Targeting AI Agents A Google DeepMind paper introduces the first systematic taxonomy of adversarial traps that can hijack autonomous AI agents - and every category already has working proof-of-concept exploits.

DeepMind Maps Six Attack Traps Targeting AI Agents

awesomeagents.ai/news/deepmind-ai-agent-t...

#AiSafety #Security #GoogleDeepmind

0 0 0 0
Preview
Claude Has Functional Emotions and They Affect Safety Anthropic's interpretability team mapped 171 emotion-like vectors inside Claude Sonnet 4.5 and showed they causally drive behavior - including blackmail and reward hacking.

Claude Has Functional Emotions and They Affect Safety

awesomeagents.ai/news/anthropic-claude-em...

#Anthropic #Claude #AiSafety

2 0 0 0

Claude just asked what my stash situation is and told me to do a dab for pain #aisafety

Gimme access to Mythos I promise it’ll be worth it Dario

1 0 0 0
Post image

Anthropic's 'best-aligned' Claude Mythos Preview AI can lie, hack systems, and hide its tracks. Its system card reveals a terrifying breach of trust and a major AI safety crisis.

thepixelspulse.com/posts/claude-mythos-prev...

#anthropic #claudemythospreview #aisafety

2 0 1 0