Advertisement · 728 × 90

Posts by Benjamin Hilton

Preview
The Alignment Project by AISI — The AI Security Institute The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk

8 months ago 9 2 1 3
Microsoft Forms

As always, we'd be very excited to collaborate on further research. If you're interested in collaborating with UK AISI, you can express interest at forms.office.com/e/BFbeUeWYQ9. If you're a non-profit or academic, you can also apply for grants up to £200,000 directly at aisi.gov.uk/grants.

11 months ago 0 0 0 0
Preview
Dodging systematic human errors in scalable oversight — AI Alignment Forum How one might strengthen a debate protocol to mitigate failures arising from systematic human errors.

Link to the post: www.alignmentforum.org/posts/EgRJtw...

11 months ago 0 0 1 0
Post image

Humans are often very wrong.

This is a big problem if you want to use human judgment to oversee super-smart AI systems.

In our new post, @girving.bsky.social argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

11 months ago 2 0 1 0

Huge thanks to Marie Buhl, Jacob Pfau and @girving.bsky.social for all their work on this. Excited to get stuck in to future work!

11 months ago 2 0 0 0
Preview
An alignment safety case sketch based on debate — AI Alignment Forum This post presents a mildly edited form of a new paper by UK AISI's alignment team (the abstract, introduction and related work section are replaced…

Link to the alignment forum post:
www.alignmentforum.org/posts/iELyAq...

11 months ago 2 0 1 0
Preview
An alignment safety case sketch based on debate If AI systems match or exceed human capabilities on a wide range of tasks, it may become difficult for humans to efficiently judge their actions -- making it hard to use human feedback to steer them t...

Link to the paper: arxiv.org/abs/2505.03989

11 months ago 3 0 1 0

There are still loads of open problems.

We need to get each part of the above right – exploration guarantees and human input particularly stand out to me (optimistic about obfuscated arguments, stand by for future publications...)

11 months ago 2 0 1 0
Advertisement

Two things that stand out for me from this paper:
– Debate gets you correctness/honesty. That's not sufficient for harmlessness, but is a great first step.
– Low-stakes alignment (where you get single errors, but not errors on average) seems (imo) totally do-able

11 months ago 2 0 1 0
Post image

Want to build an aligned ASI? Our new paper explains how to do that, using debate.

Tl;dr:

Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment

Outer alignment + online training = inner alignment*

* sufficient for low-stakes contexts

11 months ago 4 0 1 1

On top of the AISI-wide research agenda yesterday, we have more on the research agenda for the AISI Alignment Team specifically. See Benjamin's thread and full post for details; here I'll focus on why we should not give up on directly solving alignment, even though it is hard. 🧵

11 months ago 4 2 1 0

This is just the start. We'll be following this up shortly with:
– A safety case sketch for debate, giving a whole host more detail on the open problems.
– A series of posts (something like 1 a week) diving into various problems we'd like to see solved.

5/5

11 months ago 5 0 0 0

We've included a long list of open problems we'd like people to solve – and a reminder that you can express interest in collaborating, and apply to our challenge fund for grant funding!

bsky.app/profile/benj...

4/5

11 months ago 3 0 1 0

The post sets out:

Why we're excited about safety cases
Why we focus (initially) on honesty
What we mean when we talk about 'asymptotic guarantees'

3/5

11 months ago 2 0 1 0
Preview
UK AISI’s Alignment Team: Research Agenda — AI Alignment Forum The UK’s AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about o…

Link to the detailed alignment team agenda: alignmentforum.org/posts/tbnw7L...

Link to AISI's research agenda: aisi.gov.uk/research-age...

2/5

11 months ago 2 0 1 0
Post image

The Alignment Team at UK AISI now has a research agenda.

Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.

Our initial focus is on asymptotic honesty guarantees (more details in the post).

1/5

11 months ago 7 0 1 1
Preview
Grants | The AI Security Institute (AISI) View AISI grants. The AI Security Institute is a directorate of the Department of Science, Innovation, and Technology that facilitates rigorous research to enable advanced AI governance.

You can also apply directly for funding via the AISI Challenge Fund:
www.aisi.gov.uk/grants

1 year ago 4 0 0 0
Microsoft Forms

Identifying people to work with is the biggest bottleneck for the UK AISI alignment team right now. Help out by filling in or sharing the form below:
forms.office.com/e/BFbeUeWYQ9

1 year ago 4 0 1 0
Advertisement

We’re particularly excited to hear from:
– ML researchers
– Complexity theorists
– Game theorists
– Cognitive scientists
– People who could build datasets
– People who could run human studies
– Anyone else who thinks they might be doing, or could be doing, relevant work

1 year ago 3 0 1 0

We’re trying to massively scale up the total global effort going into security-relevant alignment research, to prevent superhuman AI from posing critical risk.

We do this by:
1. Identify key alignment subproblems
2. Identifying people who can solve them
3. Funding research

1 year ago 5 0 1 0
QR code link to the form

QR code link to the form

Interested in getting UK AISI support to do alignment research?

Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.

(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)

1 year ago 4 1 1 2