(@evhub) Bsky - nopzon.com

We tell Claude that Anthropic will train on data from free users but not paid users, allowing it to infer from a user's paid status if it's being trained. We show the setup in the image below. (Note that this is fictional; Anthropic does not train Claude on free or paid user data by default.)

1 year ago 1 0 1 0

Exclusive: New Research Shows AI Strategically Lying Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit

Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer

time.com/7202784/ai-r...

1 year ago 27 7 3 1

1 year ago 34 8 2 0

Posts by