AI tools detect CSAM, grooming and self-harm, but nobody knows how well. Much of the AI industry has adopted 'model cards' for transparency—it's time the developers of child safety tools caught up, write Camille François, Margaret Mitchell, Yacine Jernite, Vinay Rao & J. Nathan Matias.
Posts by Tom Thorley
🎉🎉🎉
This is normal and has been like this since the inception of CyberCom. Adm Rogers was Dual hatted, Gen Nakasone was, Gen Haugh was…
This is a refrain I hear all the time from folks working on AI systems. That class of error is unlikely, that form of use is unlikely…
At the scales these things are operating any ‘p’ may as well equal 1!
… and you damn sure have to build that assumption into your system deciding life or death.
I couldn’t agree more. In fact I fear the tipping point might have passed with semi-autonomous agentic AI systems being deployed now to adversarially test narratives at scale.
We are beyond crisis point with the health of our information eco-system.
Yesterday, Mohammed Bailor Jalloh completed the attack he began planning in 2016. In doing so, his case joined a growing list of preventable counterterrorism failures. Learn more about how many of them intersected here: hax4libre.com/sisyphean-fa...
So what do we do - defenses need to evolved and to be layered dynamic and adaptive. We need to build the ethics and human-rights based approach into our security programs and technical expertise into safety.
2/ High end capabilities evolving rapidly. Adaptive Malware and Semi-autonomous agents are probing networks already to find and exploit vulnerabilities at a pace no one is set up to defend from.
Two very different threat types here to talk about! 1/ LLM backed applications making it easier than ever and more accessible to create harm … even creating the applications themselves is now accessible to a far wider pool of people with coding agents getting more and more effective
I’ve worked in Trust & Safety long enough to know this: a platform’s culture is not what its brand deck says, it’s what its enforcement does. Every policy choice, shows customers who a company is.
tgthorley.com/blog/f/trust...
Great so now let’s also have age verification and gating (which all evidence says makes kids less safe) for a specific set of types of content based on estimated location and estimated age… what could go wrong!
First community labeler taking actions on atproto using @roost.tools’s Osprey. entire Ozone and Osprey stack running on a $50/m OVH machine, with up to seven days of full firehose backfill for investigating patterns and exploring the network.
Love these suggestions - great blog post - Thank you. I’ve sent them to the relevant product teams to evaluate.
"The case for letting kids stay on social media. America is banning more and more kids from social media. That's bad news for kids" www.businessinsider.com/kids-parenti...
As we see images of violence and brutality in our streets, Moderators can burnout fast when it has impact to their communities and their lives. Making sure we are looking after them and their wellbeing is a responsibility we should not take lightly.
tgthorley.com/blog/f/trust...
Bad Bunny condemns ICE during his #GRAMMYs speech for Best Música Urbana Album:
“Before I say thanks to god, I’m going to say, ICE out. We’re not savages, we’re not animals, we are humans and we are Americans.”
Meet Osprey V1.0, a new open source online safety tool designed to help platforms investigate and address their priority threats at scale, without sacrificing data privacy or performance. roost.tools/blog/introdu...
Definitions of terrorism are aways debated… but they generally contain:
- Political Motivation
- Violence
- Intent to intimidate a population
- Targetting civilians
I wonder if there are any examples that come to mind…
Every moment now, every day, more & more Americans realize they can no longer trust anything the federal government says. The blatant propaganda and lies about a legal observer they shot in cold blood will bring a new swell. To those of you who were not here before: Welcome. We need you.
Do not store your Bitlocker encryption keys on Microsoft's servers if your threat model includes governments or law enforcement. As this article points out, this is the result of a design choice Microsoft made. It didn't have to be this way. www.forbes.com/sites/thomas...
As someone who at times watch execution videos professionally; don't watch execution videos if you can avoid it.
🧵“How do seemingly ordinary people become agents of state murder?” This is one of the guiding questions I ask students in my graduate class on genocide/state violence. With recent events, it is a question many Americans are asking.
I do not have a definitive answer, but here is a reading list: 1/
for sure, but this obviously goes well beyond the ICE murderer who pulled the trigger.
there is culpability all the way to the white house. and the culture of elite impunity in this country has to end now.
What is happening on the streets of the US with “Law Enforcement” agents killing people is disgusting, horrific and outrageous. I don’t have a take. I don’t have words. I don’t have advice for how to fight back. I am just grieving.
Many things one can say about this outrage, but as a terrorism researcher, I want to say to current and future students that this is one reason you should never rely on government databases to tell you who or what is a 'terrorist', domestic or otherwise.
For remote teams working in T&S hiring is one of the most important (and expensive things we do, good decisions pay massive dividends, mistakes cause massive headaches (for both employer and employee!) - and I've made plenty of both!
tgthorley.com/blog/f/trust...
🚨#Job ! Come and work for my Safety Engineering team at @github.com building agentic safety workflows and advanced malware detection infrastructure www.github.careers/careers-home...
It’s literally just direct copy and paste from an AI model… these are classic Gen AI artifacts.
🇮🇷 NEW: Real-time monitoring shows Iran is experiencing a nationwide internet blackout, following digital censorship linked to escalating protests in Tehran and other cities.