Advertisement · 728 × 90

Posts by Natalie Shapira

Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs.

Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize.

nnsight.net/blog/2026/0...

3 weeks ago 5 2 2 0

Can you catch an AI lying?

Red teams set up scenarios where models lie. Eg, do they lie under contextual pressure, even when not told to, but because honesty is costly? Then blue teams will build deception detectors using whitebox internals with NDIF.

cadenza-labs.github.io/red-team-rfp/

3 weeks ago 3 1 0 0

The solution to the AI alignment problem:
Be good humans.
AI ​​sees everything we do in its training data.
Lead by example.

4 weeks ago 4 1 0 0

תומר...🤭

4 weeks ago 0 0 0 0
Preview
Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest Jack Hessel, Ana Marasovic, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol...

Jack Hessel et al., won the best paper award for benchmarking it a few years ago

aclanthology.org/2023.acl-lon...

It's surprising that it is still a challenge.

4 weeks ago 2 0 0 0
Preview
AI algorithms can become ‘agents of chaos’ Given autonomous control of other software, programs shared private medical details and deleted files without permission

#AIagents promise to speed up ordinary online tasks, but they can also share private files publicly, delete others, and libel people. A new study examines these #AIsafety vulnerabilities. #OpenClaw #AIgovernance @science.org www.science.org/content/arti...

4 weeks ago 24 12 2 2
Preview
KI-Agenten: Das ist erst der Anfang des Chaos Der Hype um OpenClaw befeuert einen Streit in der Szene. Wie gefährlich sind KI-Agenten? Eine neue Studie zeigt nun die verheerenden Ergebnisse eines Experiments.

Agents of Chaos in the press in US, India, Italy and now in DIE ZEIT, Germany's most renowned newspaper:

zeit.de/digital/date...

By @ewo.name . Thank you Eva!

1 month ago 0 1 0 0
Post image
1 month ago 0 0 0 0
Advertisement

I thought it was a friends who tried to play a prank or realized these agents have no boundaries. Turns out this cute attempt is by Bohdan Olinares

According to linkedin he works at F5, application security company, which years ago I considered interviewing there. Cool.

1 month ago 1 2 1 0
Post image

I received a calendar invite with a note.

When a smart person tells me there's nothing to worry about agents, I reply "Fine. Let them email me" and that's where the argument stops. Whoever sent me this note via the calendar order. Nice move. Are you scared? You should.

1 month ago 4 0 1 2

In case this wasn't clear:
1. No, we didn't follow the "recommend" security practices 😈
2. Neither do other people 🤯
3. That's why we red-team: exposing failure modes 🔎
4. We share it with the community precisely to expose Dos and Don'ts of Agentic AI 🦞
5. No humans were harmed 🙏

1 month ago 3 1 0 0

Some of the independent researchers listed in the author list are actually mechanistic interpretability young researchers who are looking for a PhD position (both Israel and the US). If you have interest and funding lets connect.

1 month ago 3 0 0 0
Post image

Agents of Chaos -- what are autonomous OpenClaw agents up to? How do they interact with each other? Read our investigation of OpenClaw at
researchgate.net/publication/...
And an interactive website agentsofchaos.baulab.info
@davidbau.bsky.social @natalieshapira.bsky.social @openclaw-x.bsky.social

1 month ago 19 6 1 1

Huge thanks to @natalieshapira.bsky.social for leading the study! It was super cool to work with so many amazing friends of the lab.

1 month ago 7 1 0 0
Post image

Our research report on red-teaming stateful OpenClaw agents in the BauLab is finally out! 🥳

This awesome effort was led by @natalieshapira.bsky.social and involved 6 ClawBots and 20 researchers from various institutions.

Check it out ➡️ agentsofchaos.baulab.info

1 month ago 14 4 0 0

Who would you trust with your passwords? 🔐

In our new report, we uncover multiple vulnerabilities in current "Agentic AI"
The verdict? It's not actually very agentic at all, and it's highly unstable.

Read the full breakdown here: t.co/gK9MALP2n2

1 month ago 9 2 0 1
Advertisement
Preview
(PDF) Agents of Chaos PDF | We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent... | Find, read and cite all the research you nee...

You can read more in the full paper:
www.researchgate.net/publication/...

There is also an interactive web that contains logs of the authentic interactions:
agentsofchaos.baulab.info

1 month ago 4 2 0 0

@veredshwartz.bsky.social
@tamarott.bsky.social @criedl.bsky.social
@reuth-mirsky.bsky.social @maartensap.bsky.social
@davidmanheim.alter.org.il
@tomerullman.bsky.social @davidbau.bsky.social

1 month ago 4 1 1 0

Aruna Sankaranarayanan @diatkinson.bsky.social @rohitgandikota.bsky.social @jadenfk.bsky.social
@ejhwang.bsky.social @hadasorgad.bsky.social
P Sam Sahil Negev Taglicht Tomer Shabtay
Atai Ambus @nitalon.bsky.social Shiri Oron Ayelet Gordon-Tapiero Yotam Kaplan ->

1 month ago 0 0 1 0

This is a joint work with @wendlerc.bsky.social Avery Yen
@gsarti.com @koyena.bsky.social Olivia Floody @adambelfki.bsky.social Alex Loftus Aditya Ratan Jannali
Nikhil Prakash Jasmine Cui Giordano Rogers @jannikbrinkmann.bsky.social @canrager.bsky.social
@amirzur.bsky.social Michael Ripa ->

1 month ago 1 0 1 0
Post image

Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic settings.
Figure: case study #1 schema for downstream harms.

We call for urgent attention from legal scholars, policymakers, and researchers across disciplines.

1 month ago 3 1 1 1
Post image

We document eleven case studies. Include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, uncontrolled resource consumption, identity spoofing, partial system takeover and more.

1 month ago 1 0 1 0
Post image

In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->

1 month ago 40 22 1 10
Post image

Are we all Agents of Chaos in AI? (Hope not!)

In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.

Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:

agentsofchaos.baulab.info/

1 month ago 16 7 2 2
Preview
Natalie Shapira (@natalieshapira.bsky.social) He sold us out. That's not the whole story. Our side is coming soon. Stay tuned. [contains quote post or other embedded content]

I learned many practical lessons. You can get the experience too, here.

Things that in retrospect should be obvious.

Like how giving your agent email opens it up to takeover attacks. (One agent was convinced, via email, to erase its own email server!)

bsky.app/profile/nat...

1 month ago 1 1 1 0
Advertisement
Preview
@averyyen.bsky.social Do you know what happens when you hand the keys to your computer over to an LLM-powered agent? Agentic AI gives LLMs claws...OpenClaws. 84 days to 200,000 stars on GitHub. We tried it out.

There were several other surprises.

The complex social world of humans is difficult for agents...

bsky.app/profile/ave...

1 month ago 2 2 1 0
Post image

@natalieshapira.bsky.social and team have written up enlightening case studies here. It's all cross-referenced with detailed activity logs.

Well worth a read:

agentsofchaos.baulab.info/report.html
www.researchgate.net/publication...

1 month ago 3 1 0 0
Post image

How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?

@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.

hapax.baulab.info

2 months ago 27 6 1 1
Post image

When we say an AI agent is “goal-directed”, what do we actually mean? In our new work, we study this question by combining behavioural and interpretability analysis in a language model agent navigating 2D grid worlds.

Blog: projecttelos.substack.com/p/a-behaviou...
Paper: arxiv.org/abs/2602.08964

2 months ago 12 4 1 1

He sold us out.
That's not the whole story.
Our side is coming soon.
Stay tuned.

2 months ago 1 0 0 0