Advertisement · 728 × 90

Posts by Hillary Sanders

Preview
The Development Basics of Managed Inference and Agents | Heroku Join Heroku superfan Jon Dodson and Hillary Sanders from the Heroku AI Team for the latest entry in our “Deeply Technical” series. In this episode, the pair discuss Heroku Managed Inference and Agents...

I went on the Code[ish] podcast to talk about AI, LLMs, and building Heroku's Managed Inference & Agents platform:
🎧 www.heroku.com/podcasts/cod...

9 months ago 0 0 0 0
Building Scalable AI Tool Servers with Model Context Protocol (MCP) and Heroku (Sponsor: Heroku)
Building Scalable AI Tool Servers with Model Context Protocol (MCP) and Heroku (Sponsor: Heroku) YouTube video by PyCon US

Here is a recording of my live demo at PyCon US 2025 on building scalable AI tool servers using the Model Context Protocol (MCP) and Heroku

www.youtube.com/watch?v=01I4...

10 months ago 0 0 0 0
Preview
Elon Musk endorses far-right German political party, wading deeper into global politics | CNN Business Musk, the billionaire Trump ally who is playing a public role in the incoming administration, posted in support Friday of Alternative for Germany, or AfD, after the German government collapsed this we...

I was surprised at how clear-cut and blatant it was. I mean, two times in a row, closed fingers, correct angle.

Meanwhile, Musk has recently issued public support for the far-right wing AfD party, often described as anti-semetic / extremist.

www.cnn.com/2024/12/20/m...

That + no apology...

1 year ago 1 0 1 0

In honor of MLK day, here's super interesting essay my partner wrote on Martin Luther King Jr: what he actually believed and accomplished (different than what is sometimes described).

docs.google.com/document/d/1...

Incredibly impressive person.

1 year ago 0 0 0 0

Nice! Would love to be added (11 yrs in AI, co-author of Malware Data Science, love them NNs)

1 year ago 1 0 1 0
Post image

Am I reading this right? Techniques to make the model safe again had almost no effect on non-small models :o.

1 year ago 0 0 0 0
Post image
1 year ago 0 0 1 0

Sleeper Agents
arxiv.org/pdf/2401.05566

So many AI safety issues get worse, & harder to combat the larger and more advanced your model gets:

"The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning"

1 year ago 0 0 1 0
Post image Post image Post image Post image
1 year ago 0 0 0 0
Advertisement

A response to X is going to be (usually) written by someone socially, politically near X's author, vs some other random piece of content Y.

It's extremely hard to take out sycophancy out of an LLM, trained the way we train them.

1 year ago 0 0 1 0

Anthropic's "Towards Sycophancy In Language Models" arxiv.org/pdf/2310.13548

TLDR: LLMs tend to generate sycophantic responses.
Human feedback & preference models encourage this behavior.

I also think this is just the nature of training on internet writing.... We write in social clusters:

1 year ago 0 0 1 0

Say a model learns strategy x to minimize training loss --> Later, min(test loss) involves strategy y, but the model regardless sticks with strat x (inner misalignment).

Assuming outer misalignment, x can be seen as safer than y.

That being said, the better the model, the less this will happen.

1 year ago 1 0 0 0

In AI safety, we have inner misalignment (actions don't minimize the loss function) and outer misalignment (loss function is misspecified).

But I do think that inner misalignment (~learned features) tend to act as a protective mechanism to avoid outer misalignment implications.

I, er, really hope.

1 year ago 2 0 1 0