Samidh (@samidh) Bsky

How Zentropi partners with Character.ai Character.ai takes safety seriously. With millions of users creating and chatting with AI characters every day, the team invests heavily in systems that help protect their community — and they're alwa...

It has been incredible partnering with character.ai since the very start of zentropi.ai. We're excited to share some details of that partnership with this case study. Anyone creating AI-powered systems might find it interesting! blog.zentropi.ai/how-zentropi...

6 days ago 0 0 0 0

Why AI Makes Content Moderation Better, Not Worse Building a political content labeler with AI — what actually works

In 2017, it took us months to define “political ad” at Facebook. Recently, I built two political content classifiers in an afternoon using Zentropi AI created by @dwillner.bsky.social and @samidh.bsky.social

Why AI content moderation is good, actually — in this week’s newsletter 👇

3 weeks ago 9 5 1 1

GitHub - zentropi-ai/skills: Agent skills powered by the Zentropi content classification engine Agent skills powered by the Zentropi content classification engine - zentropi-ai/skills

This means anyone building an agent can give it a principled, consistent way to evaluate content instead of hoping the LLM gets it right. Small step, but it makes agents more trustworthy by default. Give the skill at try here and let us know how it goes: github.com/zentropi-ai/skills/

1 month ago 3 1 1 0

We just shipped something that helps with this. Zentropi is now an agent skill — your agent can classify content against plain-English policies in real time. It can even draft the policies itself if you describe what you're looking for. It is like having @dwillner.bsky.social on call 24/7.

1 month ago 1 1 1 0

The honest answer is that most agents just wing it. The LLM guesses, and the guess is often different every time. Not to mention the fact that they are very expensive and slow, making them impractical for most at-scale systems.

1 month ago 0 0 1 0

One of the things we've been thinking about a lot at Zentropi is: what happens when AI agents need to make judgment calls about content — not humans reviewing a queue, but agents acting autonomously?

1 month ago 1 1 2 0

Enabling Streaming Classification CoPE now enables streaming classification using a linear probe. This is an experimental methodology we developed to help ensure the safety of real-time generative AI systems.

We're publishing our streaming classifier openly — the methodology, weights, and a full tutorial. This is a problem the whole T&S community needs to solve, so we're eager to see others build upon our technique. Full details on our blog:

blog.zentropi.ai/enabling-streaming-classification/

1 month ago 2 0 0 0

Today, we are releasing a classifier that can score content as it streams, token by token. It can flag that a violation is developing partway through a sequence — early enough to actually do something about it. Interrupt generation. Route to review. Log a warning. Things you can't do with post-hoc.

1 month ago 2 0 1 0

The idea is simple: if you're already running content through a classifier, the model is already building internal representations at every token. Those representations already encode whether a violation is developing — you just have to ask.

We trained a tiny linear probe to do the asking.

1 month ago 3 0 2 0

There's a major gap in content safety tooling: classifiers typically only score complete text. When you're working with generative AI, "complete text" means the user already saw it. That's too late.

So we built a streaming classifier that we're releasing today! Here's what we did and why.

🧵...

1 month ago 8 2 1 1

... image classifiers also!

2 months ago 3 0 1 0

If you're not using either tool yet, now's a good time to try both! Zentropi's Community Edition is free and gives you unlimited labelers. Coop is fully open source and runs on your infrastructure.

:D

2 months ago 9 1 0 0

Thank you for your leadership and for being great stewards of Cove/Coop.

2 months ago 2 0 1 0

Zentropi Now Powers Coop Zentropi labelers can now be used as classifiers within Coop, ROOST's Open Source Moderation Platform

@dwillner.bsky.social and I have spent years watching T&S teams rebuild the same infrastructure from scratch. This is what it looks like when open tools actually work together instead. Really proud of this one and appreciative of @roost.tools's leadership!

Details: blog.zentropi.ai/zentropi-now...

2 months ago 4 2 0 2

Zentropi is now integrated into Coop, @roost.tools's open source moderation platform. You can write a content policy in plain English on Zentropi, plug it into Coop as a signal, and have a moderation pipeline running in minutes.

2 months ago 7 2 1 0

AI is Removing Bottlenecks to Effective Content Moderation at Scale Zentropi's Dave Willner says LLM-driven technology can now accomplish content classification at the scale necessary for moderation on large platforms.

Dave Willner, who led trust and safety at major tech firms and has cofounded a company that is developing an AI content classification platform, says LLM-driven technology can now accomplish classification at the scale necessary for moderation on large platforms. That has substantial implications.

2 months ago 4 1 1 1

I can has cats.

2 months ago 1 0 0 0

Zentropi Now Labels Images Building guardrails for visual content just got a lot easier. Today we're launching image classification on Zentropi and announcing cope-b-12b, a multimodal model that powers this experience.

Just shipped Zentropi's most requested feature: image classification!

Now analyze images against your own policies, at scale.

To power it we built cope-b-12b, a new multimodal model w/ native vision.

Check out the cat detector we made in < 1 min. 🐱
blog.zentropi.ai/zentropi-now-labels-images/

2 months ago 13 4 0 1

On the other hand, interesting contribution to do this all with a single transformer and candidate isolation.

2 months ago 5 0 0 0

GitHub - xai-org/x-algorithm: Algorithm powering the For You feed on X Algorithm powering the For You feed on X. Contribute to xai-org/x-algorithm development by creating an account on GitHub.

If you are looking for a technical description of how X rots your brain, look no further than their github post on the 'X algorithm'. It is pure, unadulterated behavioral engagement maximization that amplifies the very worst human impulses. github.com/xai-org/x-al...

2 months ago 45 22 3 1

Would love to hear more! What kind of community guidelines were you feeding to CoPE? What worked well and where were there gaps?

3 months ago 0 0 0 0

Why are we just giving away all our secrets? Well, it is our hope that it helps the ecosystem further advance the state of the art in policy-steerable content classification, which is foundational to a more trustworthy internet.

3 months ago 5 2 0 0

Dave just published a Zentropi labeler that can precisely identify requests at prompting an AI model to undress a person in a photo. The tools exist to easily deal with this problem -- platforms just need to choose to use them. If you are the developer of an AI system, please use this guardrail!

3 months ago 5 3 0 1

"We'll make it right for you"

3 months ago 1 0 0 0

This was such a cool experiment that I created a Zentropi labeler with a simplified version of the authors' Partisan Animosity criteria. Now anyone can experiment directly with using this labeler to try to reduce the temperature of affective polarization in their feeds. zentropi.ai/labelers/b30...

4 months ago 9 2 0 0

Observations on Toxicity We've published Zentropi's toxicity labeler (toxicity-public-s5), which you can integrate with your platform instantly using the Zentropi API. Browse the full policy to see how defining observable fea...

We just wrote an in-depth post about Toxic Content labeling. It presents a new way of defining toxic speech online-- and illustrates the importance of observable features for accurate language model interpretability. Would love to hear how YOU define toxicity, too! blog.zentropi.ai/observations...

5 months ago 10 2 0 0

Awesome to see how this is already being used! One of the most useful aspects is that the published policies show what it takes to write content rules that can be accurately interpreted by language models. We hope this can be a boost to the broader content policy community.

5 months ago 1 0 0 0

For clarity, the whole point of this launch is to enable people to easily customize their own policies so that we can support a plurality of content classification perspectives online! It is actually a solution to the problem Evelyn highlights in that piece.

5 months ago 1 0 0 0

This was a fun launch! It turns Zentropi into a Github for Content Labelers. You can share content policies with others and build off each other's work. It's the easiest way of deploying a fully customizable classifier. Check out the policies @dwillner.bsky.social created at zentropi.ai/u/dave

5 months ago 3 0 0 0

Content policies are usually private, one-off efforts. You build yours, I build mine, we don't share much about what works or why. This makes sense given products can (and should) set different policies based on their communities, but it leaves us reinventing the wheel. 🧵 1/5

5 months ago 18 6 2 3

Posts by Samidh