Posts by
This isn’t a final list by any means, and I'd love to hear about other very concrete projects for handling the intelligence explosion. There’s so much to do!
Link in reply.
We also sketch out a CSET-style think tank focused on the governance of outer space. And we propose a coalition of concerned ML researchers who commit to coordinated action if AI companies cross clear red lines.
Others are about building tools on top of AI. There’s so much low-hanging fruit in tools that improve collective epistemics (e.g. reliability tracking for public figures) and enable coordination (e.g. monitoring and verification tools).
Some are about shaping AI systems themselves: independently evaluating AI character traits, benchmarking AI for strategic and philosophical reasoning, auditing models for sabotage and backdoors, and brokering deals with AIs to disclose early forms of misalignment.
There are lots of projects that could really help the transition to superintelligence go much better, which almost nobody is working on.
With Fin Moorhouse, I’ve written up eight ideas that seem especially promising.
EA Forum and LessWrong discussion here:
forum.effectivealtruism.org/posts/7adm5...
www.lesswrong.com/posts/wSFmL...
Given how neglected the area is, too, I think work on AI character is among the most promising ways to help the intelligence explosion go well.
AI character is most important in worlds where alignment gets solved. But it can affect the chance of AI takeover, too. Some styles of character training may make alignment easier; and some characters are more likely to make deals rather than foment rebellion, even if they have misaligned goals.
And there will, predictably, be many future conflicts over AI character. It’s a safer world if we work through these tradeoffs ahead of time, before a crisis forces it.
This is partly true, but the constraints are not binding. At the crucial moment, there might be just one leading AI company, facing none of the usual competitive pressures. Some decisions may have path-dependent outcomes, due to stickiness of training or user expectations.
The main counterargument to the importance of AI character is that competitive dynamics and human instructions will determine the range of AI characters we get, so there’s little we can do today to affect it one way or the other.
The cumulative effect of AIs’ character traits across hundreds of millions of interactions, and in rare but critical moments, will have an enormous impact on the course of society.
A general, aiming to stage a coup, instructs an AI to build a military unit loyal only to him. Does it comply, or refuse? Two countries are on the brink of conflict, each advised by AI systems. Do those AIs search for de-escalatory options, or are they bellicose?
AI character — how honest, cooperative, and altruistic these systems are, and the hard rules they follow — will affect all of it.
And, as capabilities improve, AI systems will become involved in almost all of the world's most important decisions: advising leaders, drafting legislation, running organisations, and researching new technologies.
Churchill refused to negotiate with Hitler after the fall of France, despite some strongly pushing him to do so.
History shows the importance of individual character. Stanislav Petrov chose to ignore a false nuclear alarm when protocol demanded he report it; the world avoided nuclear armageddon that day.
Should they tell you what you want to hear, or push back when you’re off base?
I think the nature of frontier AIs’ characters is among the most important features of the transition to a post-superintelligence world. In a new article with Tom Davidson, I explain why.
Due to Claude’s Constitution and OpenAI’s model spec, more people are paying attention to the characters of the AI’s that companies are building, and the rules they follow. Should AIs be wholly obedient, or have their own ethical code? What should they refuse to help with?
You can watch the full conversation here: www.youtube.com/watch?v=xjb....
We talked about the origins of effective altruism, what EA has achieved in the last decade, preparing for the intelligence explosion, and whether I think I'm living well.
It was Peter Singer's arguments that first convinced me we have a moral obligation to do more good, so it was a pleasure to sit down with Peter & Kasia on their Lives Well Lived podcast.
It’s been an honour to have been a part of it all.
- Corporate cage-free campaigns have led to billions of hens spared from caged confinement.
- AI safety has gone from a fringe concern to a thriving field.
And in the last year alone, money moved to effective charities was up by around 40%, now closing in on $2B per year.
It feels crazy that 10 years have passed, but a lot has happened since:
- The number of people taking Giving What We Can’s 10% pledge has grown tenfold.
- GiveWell has moved over $2 billion to highly effective global health and development charities, saving over 300,000 lives.
The core of the book is the same - explaining some principles for how we can have a bigger positive impact in our lives, whether through our donations, our career choice, or what we buy.
Link to buy the book in the comments!