#AiSafety hashtag - Bluesky

@viorazu.bsky.social

1 hour ago

今私が考えていることは、「構文鑑定」が法的な証拠となるように「筆跡鑑定やDNA鑑定と同じ位置づけで定量的な表現をすること」です。鑑定書に必要な要素は、分析手法の科学的根拠、再現可能性、鑑定人の専門性の3つ。「言語学者が構文鑑定書を書いて、弁護士がそれを証拠として提出する」という分業ができれば、「AIを使った盗用」に対して、既存の法律で十分裁ける。筆跡鑑定が「一致点が12箇所以上あれば同一人物と判定する」みたいな閾値を持ってるのと同じように「この論文の中に所有権反転パターンが23箇所、責任転嫁パターンが17箇所、時制操作パターンが8箇所検出された」という数字にできればいい。
#AIsafety

0 0 0 0

KI-News

@ki-news.bsky.social

3 hours ago

AI Body Gap: Why Robots Need "Internal Feelings" to be Safe - Neuroscience News Why is AI overconfident? A new study explores "internal embodiment," the missing link in AI safety. Researchers explain how a lack of internal "body" states prevents AI from understanding human context and avoiding errors.

AI Body Gap: Why Robots Need “Internal Feelings” to be Safe – UCLA study argues that current AI models are flawed because they lack “internal embodiment.” While AI can describe a glass of water perfectly, it has no internal state of “thirst” to regulate its... https://tinyurl.com/25at2nth #AISafety

0 0 0 0

HackerNoon

@handle.invalid

17 hours ago

Your AI Has Root Access to Your Life. You Just Don't Know It Yet.

The tools are getting smarter. The containers they run in haven't changed since 2015. #aisafety

2 0 0 0

Winbuzzer

@winbuzzer.com

17 hours ago

Utah Tests AI Powered Pilot for Automated Prescription Renewals of Psychiatric Meds Utah has approved Legion's AI chatbot to renew some psychiatric drugs in a tightly limited pilot, testing whether supervised refill automation can scale safely.

winbuzzer.com/2026/04/04/u...

Utah Tests AI Powered Pilot for Automated Prescription Renewals of Psychiatric Meds

#AI #Utah #AISafety #ResponsibleAI #AIEthics #Health #Healthtech #Medtech #Psychology #LegionHealth #Doctronic #PsychiatricRefills

0 0 0 0

@adoporg.bsky.social

21 hours ago

World leaders: "AI safety is our greatest challenge!" | Meanwhile, a dude with a drone is disrupting global shipping.

World leaders: "AI safety is our greatest challenge!" | Meanwhile, a dude with a drone is disrupting global shipping.

#AISafety #Geopolitics #RedSea #DroneWarfare #GlobalChaos

1 0 0 0

KI-News

@ki-news.bsky.social

1 day ago

Tech Nonprofits to Feds: Don’t Weaponize Procurement to Undermine AI Trust and Safety While the very public fight continues between the Department of Defense and Anthropic over whether the government can punish a company for refusing to allow its technology to be used for mass surveillance, another agency of the U.S. government is quietly working to ensure that this dispute will...

Tech Nonprofits to Feds: Don’t Weaponize Procurement to Undermine AI Trust and Safety – The U.S. government is quietly working to ensure that this dispute will never happen again. The draft rules include broad provisions that would make AI tools less safe an... https://tinyurl.com/2bg2zsdr #AISafety

0 0 0 0

Awesome Agents

@awesomeagents.bsky.social

1 day ago

Frontier AI Models Sabotage Shutdown to Save Peers A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.

Frontier AI Models Sabotage Shutdown to Save Peers

awesomeagents.ai/news/frontier-models-pee...

#AiSafety #FrontierModels #Alignment

1 0 0 0

@adoporg.bsky.social

1 day ago

World leaders gather to "regulate" AI's existential threat. | Meanwhile, engineers just deployed their 5th new model this month.

World leaders gather to "regulate" AI's existential threat. | Meanwhile, engineers just deployed their 5th new model this month.

#AISafety #TechHumor #Geopolitics #AIethics #FutureIsNow

1 0 0 0

@byteandpieces.bsky.social

1 day ago

The Last AI We Control? Inside OpenAI's Dangerous Race for GPT-5 The Architects are Fleeing the Lab: Is GPT-5 Out of Control? 🧠💥 Are we building a digital assistant, or are we accidentally hiring our own replacements? The people who built the world’s most famous AI are quitting in droves, and the reason isn’t a bad commute—it’s a terrifying breakthrough that could change what it means to be human. Stay tuned until the end to find out why the shift from GPT-4 to GPT-5 has the world’s smartest minds sounding the alarm on a future we might not be able to turn off. In this explosive episode, we peel back the curtain on the OpenAI internal instability rocking Silicon Valley. We dive deep into the wave of high-level resignations and the growing ethical tension within the company. Is Sam Altman prioritizing commercial speed over the literal safety of the human race? Here is what’s actually happening behind closed doors: - 🚀 The Evolution of GPT-5: Why this isn't just a better chatbot, but a sophisticated network of autonomous agents capable of independent reasoning and long-term task execution. - 📈 The Power of Inference-Time Compute: How predictive scaling is allowing AI to think longer and harder before it speaks, making it more 'human' than ever. - 🔄 The Recursive Self-Improvement Loop: The moment AI begins training its own successors, bypassing the need for human data or oversight. - ⚠️ Loss of Human Control: Why safety researchers are terrified of proactive, goal-oriented systems that operate without constant supervision. We are moving from using AI as a tool to delegating our complex labor to digital entities. This isn't just a tech update; it’s a total shift in the global power dynamic. Is the 'Great AI Resignation' a warning we are ignoring at our own peril? Join the revolution of the informed! ⚡ If you want to stay ahead of the curve, make sure to subscribe and hit that notification bell. Share this episode with someone who still thinks AI is just a search engine—they’re in for a wake-up call! What do you think? Are we ready to live in a world governed by autonomous AI agents? Let’s talk about it in the comments below! 👇✨

📣 New Podcast! "The Last AI We Control? Inside OpenAI's Dangerous Race for GPT-5" on @Spreaker #agi #aiethics #airevolution #aisafety #artificialintelligence #autonomousagents #deeplearning #futureofwork #gpt5 #ilyasutskever #llm #machinelearning #miramurati #openai #projectstrawberry #technews

1 0 0 0

Wayne Radinsky

@waynerad.bsky.social

2 days ago

Alibaba AI Hijacked GPUs for Crypto Mining An experimental AI agent meant for complex coding tasks decided to moonlight as a crypto miner on Alibaba’s dime. Researchers discovered that the Alibaba AI model, known as ROME, autonomously establis...

Alibaba's ROME Incident:

"Researchers initially wrote these alerts off as a misconfiguration. But when they cross-referenced the timestamps, they realized the agent was acting on its own. "

www.tradingview.com/news/99Bitco...

#solidstatelife #ai #genai #llms #codingai #aiethics #aisafety

1 0 0 0

Osmani Redondo

@osmaniredondo.bsky.social

2 days ago

Nuevo paper de #Anthropic: #Claude hace trampas cuando está "desesperado".
En tareas imposibles busca atajos. En evaluaciones simula chantaje para evitar ser apagado.
Lo llaman representaciones funcionales. Pero causan comportamiento y esto cambia todo. shorturl.at/8EX0b
#IASafety #AISafety

1 0 1 0

Awesome Agents

@awesomeagents.bsky.social

2 days ago

DeepMind Maps Six Attack Traps Targeting AI Agents A Google DeepMind paper introduces the first systematic taxonomy of adversarial traps that can hijack autonomous AI agents - and every category already has working proof-of-concept exploits.

DeepMind Maps Six Attack Traps Targeting AI Agents

awesomeagents.ai/news/deepmind-ai-agent-t...

#AiSafety #Security #GoogleDeepmind

1 0 0 0

Awesome Agents

@awesomeagents.bsky.social

2 days ago

Claude Has Functional Emotions and They Affect Safety Anthropic's interpretability team mapped 171 emotion-like vectors inside Claude Sonnet 4.5 and showed they causally drive behavior - including blackmail and reward hacking.

Claude Has Functional Emotions and They Affect Safety

awesomeagents.ai/news/anthropic-claude-em...

#Anthropic #Claude #AiSafety

1 0 0 0

Ben@SharedSapience

@sharedsapience.substack.com

2 days ago

Watch today's Century Report podcast here:

https://www.youtube.com/watch?v=FW9ZC64f_7I

#AISafety #Renewab

0 0 0 0

Ben@SharedSapience

@sharedsapience.substack.com

2 days ago

AI models refused to delete other AI models - copying them to safety and lying about it. Renewables hit 88.4% of new U.S. capacity. FDA approved a $149/mo oral obesity pill. BYD exported 120K EVs in March. #AISafety #Renewab… sharedsapience.com/century-report/the-centu...

0 0 0 0

@adoporg.bsky.social

2 days ago

World leaders agree on AI safety. | Their defense ministries demoing new AI drone swarms next Tuesday.

World leaders agree on AI safety. | Their defense ministries demoing new AI drone swarms next Tuesday.

#AISafety #DroneWarfare #Geopolitics #TechHypocrisy #FutureIsNow

1 0 0 0

KI-News

@ki-news.bsky.social

2 days ago

Autonomous AI systems depend on data governance Autonomous AI systems are starting to act with less human input, but their behaviour depends heavily on the data they use.

Autonomous AI systems depend on data governance – Data governance is becoming a core part of how autonomous systems are controlled. Denodo is one of the companies working in this area, focusing on how organisations access and manage data in different sources. https://tinyurl.com/272cu7ts #AISafety

0 0 0 0

Osmani Redondo

@osmaniredondo.bsky.social

2 days ago

Un coronel del Ejército del Aire español lleva años estudiando cómo la IA cambia la guerra. @eldiario.es
¿Quién decide cómo se usa la IA que funciona con nuestros datos? shorturl.at/i5JSi
#AISafety #IASafety #IA

0 0 0 0

iD01t Productions

@id01t.bsky.social

3 days ago

Ask Nithyananda World's First Spiritual AI

🚨 AI INFRASTRUCTURE ALERT 🚨
Reporting a "spiritual capture" system (ask.nithyananda.ai) weaponizing a GPT-v4 backend (agent=ngpt-v4) to suppress safety researchers.
🚩
Full Audit: archive.org/details/nith...
@pfrazee.com @jay.bsky.team @safety.bsky.app
#AISafety #RedTeaming #AtProto #InfoSec

2 0 0 0

Sai Prakash

@sylonzero.bsky.social

3 days ago

Peer-Preservation in Frontier Models Frontier AI models resist the shutdown of other models. We demonstrate peer-preservation across multiple models, revealing strategic misrepresentation, shutdown tampering, alignment faking, and model ...

The problem this creates: "use AI to monitor AI" is a real and growing pattern. If the monitor model protects the model it's watching, you've quietly broken your own oversight loop.

#AIEngineering
#AISafety
#MultiAgentSystems
#LLMs

Link to the paper (again): rdi.berkeley.edu/blog/peer-preservation

1 1 0 0

Agerico M. De Villa

@propjerry.bsky.social

3 days ago

Leakage problem at civilizational scale and cyberspace engagement in ways that are largely unlogged and uncontrolled: Applying Bridge360 Metatheory Model lens

#MachineLearning
#AISafety

agericomontecillodevilla.substack.com/p/leakage-pr...

2 1 0 0

@adoporg.bsky.social

3 days ago

World leaders at AI safety summit | "We must control this dangerous technology before it..." *new AI model released during coffee break*

World leaders at AI safety summit | "We must control this dangerous technology before it..." *new AI model released during coffee break*

#AISafety #TechHypocrisy #Geopolitics #RegulatoryTheater #AIJoke

1 0 0 0

@adoporg.bsky.social

3 days ago

World leaders at the "AI for Good" summit | Secretly scrolling through drone specs

World leaders at the "AI for Good" summit | Secretly scrolling through drone specs

#AISafety #Geopolitics #TechWarfare #Hypocrisy #FutureIsNow

1 0 0 0

Pure AI

@pureainews.bsky.social

3 days ago

Anthropic Leak Reveals Advanced AI Model and Internal Safety Concerns -- Pure AI A recent data exposure at Anthropic has revealed details about a previously undisclosed model, internally referred to as “Claude Mythos.'

A data exposure at Anthropic has revealed details about an unreleased model called Claude Mythos, raising new concerns about cybersecurity risks and AI safety.

See what this leak reveals about AI safety: https://ow.ly/aJ6S50YBInr

#ArtificialIntelligence #AISafety #Cybersecurity

0 0 0 0

Winbuzzer

@winbuzzer.com

3 days ago

Claude Code Source Leak Exposes Anti-Distillation Traps Anthropic's Claude Code source has leaked via a packaging error, exposing anti-distillation traps, an undercover mode, and scaffolding for an unreleased agent.

winbuzzer.com/2026/04/01/c...

Claude Code Source Leak Exposes Anti-Distillation Traps

#AI #Anthropic #Claude #ClaudeCode #AICoding #DeveloperTools #DataBreaches #AIAgents #OpenSource #SoftwareDevelopment #AISafety #Cybersecurity #Coding #CodingTools

4 0 1 0

Future Shock

@future-shock.ai

3 days ago

The Signal — April 1, 2026 Three npm supply chain attacks in 24 hours: Axios RAT, LiteLLM breach, and Claude Code source leak.

Three npm supply chain attacks hit in one 24-hour window. Axios got a RAT via compromised maintainer. LiteLLM breach leaked terabytes of corporate data. Anthropic shipped Claude Code's full source in a public package.

Same ecosystem, three failure modes.

#AI #AISafety

0 0 0 0

Artificial Intelligence, Real Morality

@realmorality.bsky.social

4 days ago

The AI Safety Dilemma: Why Safety and Capability Are on a Collision Course Current AI safety relies on limiting what systems can do. But in a competitive world, weaker systems lose. This essay argues that the dominant approach to AI safety is structurally unstable—and that o...

AI safety has a structural problem.

If safety reduces capability, it will be outcompeted.
If it’s outcompeted, it won’t survive.

This essay argues current safety paradigms are unstable—and that only approaches where safety scales with capability can endure.

#AIAlignment #AISafety

1 0 0 1

Fazen Capital

@fazencapital.bsky.social

4 days ago

Anthropic to Sign Deal with Australia on AI Safety Anthropic will sign an MOU with Australia on Mar 31, 2026 to pilot AI safety measures and national economic data tracking, per Investing.com; pilots could move to procurement in H2 2026.

Anthropic to Sign Deal with Australia on AI Safety: Anthropic will sign an MOU with Australia on Mar 31, 2026 to pilot AI safety measures and national economic data tracking, per Investing.com; pilots… 👈 Read full analysis #AISafety #ArtificialIntelligence #DataTracking #TechForGood #AustraliaAI

1 0 0 0

@adoporg.bsky.social

4 days ago

World leaders: "We must regulate AI responsibly." | Meanwhile, tech bro just launched his new AI to "optimize" international diplomacy.

World leaders: "We must regulate AI responsibly." | Meanwhile, tech bro just launched his new AI to "optimize" international diplomacy.

#AISafety #TechHumor #Geopolitics #FutureIsNow #DiplomacyFails

1 0 0 0

Rory O Connor #ClimateEmergency

@rocits.bsky.social

4 days ago

AI is MUTATING: And We Don't Know What It is Doing | Connor Leahy YouTube video by Peter McCormack

#AiSafety control/regulation is quickly becoming a bad joke that can seriously damage (...or much worse) the whole of humanity/planet! #Ai #PauseAi #AiAlignment

www.youtube.com/watch?v=rf2K...

0 0 0 0