Anton Leicht (@antonleicht) Bsky

Problems in the AI Eval Political Economy — Anton Leicht Evaluations of new AI models’ capabilities and risks are an important cornerstone of safety-focused AI policy. Currently, their future as part of the policy platform faces peril for four reasons: Entanglements with a broad AI safety ecosystem, structural incentives favouring less helpful evals, susc

Find the full piece here: www.antonleicht.me/writing/evals (12/12)

1 year ago 1 1 0 0

So what to do? From the political POV, I think:
• More separation between eval technical work and policy advocacy
• Focus on propensity over capability
• Clearer articulation of implications for deployment
• Greater focus on frameworks, fewer evals for evals' sake

(11/N)

1 year ago 0 0 1 0

And if current evals had implications for deployment (I don't think so), this should be made clear, and they should not be used for developer PR and frog-boiling.

As long as concerning evals lead to getting used to inaction, more evals are not as politically helpful. (10/N)

1 year ago 0 0 1 0

4️⃣Getting used to inaction: Developers keep releasing models with 'concerning' evals. This wears down the future impact of eval orgs expressing concern.

If current evals don't warrant deployment delays, then it should be made clear that only the trajectory is concerning. (9/N)

1 year ago 0 0 1 0

As one exhibit, see Apollo's CEO Marius Hobbhahn defending results against both ducks and rabbits simultaenously: (8/N)

1 year ago 0 0 1 0

3️⃣ The 'rabbit-duck-problem'. Exactly the same eval results simultaneously allow for catastrophization and dismissal.

Is this evidence for impending doom, or evidence that someone asked a robot to pretend to be evil? Turns out, both. (7/N)

1 year ago 0 0 1 0

@sebk.bsky.social puts the source of the resulting confusion very succinctly: (6/N)

1 year ago 1 0 1 0

2️⃣ Incentive structures favor dramatic capability findings over propensity assessments. Eval orgs, media, and developers are all incentivized to amplify 'scary' capability results. It makes for good, but confusing, headlines. (5/N)

1 year ago 0 0 1 0

1️⃣ Entanglements. Many eval orgs are funded by the same sources that fund policy orgs pushing for evals. This is easily exploitable by opponents.

This already started to show in the SB-1047 discussion: (4/N)

1 year ago 0 0 1 0

Down the road, evals need two kinds of policy support: Integration as policy instruments to operationalize restrictions, and ecosystem support. Right now, they're getting some of the latter, but headwinds are building. (3/N)

1 year ago 0 0 1 0

Problems in the AI Eval Political Economy — Anton Leicht Evaluations of new AI models’ capabilities and risks are an important cornerstone of safety-focused AI policy. Currently, their future as part of the policy platform faces peril for four reasons: Entanglements with a broad AI safety ecosystem, structural incentives favouring less helpful evals, susc

Find the full post here: (2/N)
antonleicht.me/writing/evals

1 year ago 0 0 1 0

New Post: The AI Eval Political Economy is in trouble.

Evals are crucial for safety-focused AI policy - but four structural problems threaten their future effectiveness as policy instruments. 🧵

1 year ago 2 0 1 0

This strategy is well-precedented. In the past, threats of delays have often been rolled back again fairly quickly, like in numerous conflicts between the EU and Meta or Apple.

It's a little disheartening to see usually cooler heads and elsewhere jump on the bait. (10/N)

1 year ago 0 0 0 0

Second, it creates precedent.

Lots of jurisdictions are considering AI reg. at the moment. Delays foster the impression that regulating AI leads to frustration and constraints on business.

That's fair - but it means OpenAI likely isn't only focused on making Sora available in Europe ASAP. (9/N)

1 year ago 0 0 1 0

Delaying Sora is good lobbying strategy:

First, it focuses latent frustration on uncomfortable policy, e.g. on AIA implementation through the Code of Practice and nat. laws.

Through delays, OpenAI is leveraging public & business pressure to weaken this implementation. (8/N)

1 year ago 0 0 1 0

‘Sora's delay is bad because it means some models won't be available in Europe at all'

I just don't think there is good evidence for that takeaway. If major models actually skip Europe, I'll be very concerned.

But currently, statements like
the below have better alternative explanations. (7/N)

1 year ago 0 0 1 0

I think that's because o1 competes in a tight language model market, and OpenAI can't afford surrendering first-mover advantage to advanced reasoning in a big market.

As competition around new model classes heats up, that effect will carry over - and delays will become rarer. (6/N)

1 year ago 0 0 1 0

'Sora's delay is bad because it means more future delays'

Compounding costs from further delays are very concerning. I think they're a little less likely than reported: OpenAI has delayed Sora, but not o1, a similarly regulation-relevant model (s. the scorecard). (5.5/N)

1 year ago 0 0 1 0

It's a delay that might be costly to some runaway early adoption use-cases, but runaway early adoption is usually not the European path to profitability anyways. (5/N)

1 year ago 0 0 1 0

'Sora's delay is bad because the EU economy needs Sora now'

This seems unlikely to matter at the scale that the doom-saying implies. The first couple months of OpenAI releases often see growing pains anyways, and economic integration of video generation will take some time. (4.5/N)

1 year ago 0 0 1 0

But assuming the regulatory landscape will stay as it is, how could the delay of Sora mean that 'Europe is so fucked' - why is it bad for the EU economy?

Three options: Sora delay is costly, future delays will be costly, future skipped releases will be costly. (4/N)

1 year ago 0 0 1 0

Delays follow from an asymmetry: UK and EU are regulated, many other markets aren't.

But that could change: Many other markets (e.g. US states - map below from June 2024) might also regulate soon. Once they reach critical mass, local delays wouldn't be realistic anymore. (3/N)

1 year ago 0 0 1 0

To some, this might not be bad news: The delay means that OpenAI is compelled by EU consumer market power to make Sora more compliant (≈safer?)

But Europe falling on its sword to make US AI safer wouldn't be sound political strategy. I instead focus on the economic side: (2/N)

1 year ago 0 0 1 0

Why the delay?

No official info, but it's unlikely that this is (only) about the EU AI Act: It's also delayed in the UK, which has no AIA. More likely to be about the ~similar EU Digital Markets Act and UK Online Safety Act.

The Guardian has a similar interpretation: (1/N)

1 year ago 0 0 1 0

A ray of hope: Institutions like the AI Safety Institutes are genuinely effective and can be politically valuable down the road.

But as they get associated with unpopular policy, they’re at risk, too - like with Sen Cruz' attacks on NIST. (5/N)

1 year ago 0 0 0 0

The political environment ahead is treacherous:
🇺🇸 securitisation and political backlash,
🇪🇺 competitiveness concerns,
🌎anti-bureaucracy sentiment
mean the current policy window is closing. (4/N)

1 year ago 0 0 1 0

Looking at the output, there are some genuine wins, some high-profile failures and some precarious achievements. (3/N)

1 year ago 0 0 1 0

For
@techpolicypress.bsky.social, I wrote a post-mortem on the first two years of frontier AI policy. (2/N)

1 year ago 0 0 1 0

The End Of The Beginning For AI Policy | TechPolicy.Press Anton Leicht suggests that a more defensive approach to the first episode of frontier AI policy might have been more prudent.

New for @techpolicypress.bsky.social:

The first episode of frontier AI policy ends and makes way to a different kind of debate. I look at where we are and find precarious achievements, high-profile failures and a difficult political environment ahead.

www.techpolicy.press/the-end-of-t...

1 year ago 4 3 0 0

The End Of The Beginning For AI Policy | TechPolicy.Press Anton Leicht suggests that a more defensive approach to the first episode of frontier AI policy might have been more prudent.

Assessing today’s global landscape of #regulations aimed at mitigating #frontier #AI risk reveals outright failures and precarious achievements that face increasing public and political opposition, writes @antonleicht.bsky.social.

1 year ago 6 1 0 0

Posts by Anton Leicht