OpenAI Launches 'Superalignment' to Guard Against AI Destroying Humanity -- Virtualization Review

OpenAI Launches 'Superalignment' to Guard Against AI Destroying Humanity

By David Ramel
07/05/2023

OpenAI, which last year unleashed advanced generative AI technology that is transforming the world, has launched a massive effort to guard against the potential extermination of humanity that could result from further advances.

The company, which has partnered with Microsoft to infuse AI throughout the latter's software, is the creator of the sentient-sounding ChatGPT chatbot and advanced machine language models like GPT-4.

Execs believe further AI advances could result in "superintelligence," a term coined to mark higher capabilities than even artificial general intelligence (AGI), or the ability of AI to learn to accomplish any intellectual task that human beings can perform.

**[Click on image for larger view.]** Superalignment *(source: OpenAI).*

OpenAI today (July 5) introduced its new superalignment initiative, a four-year effort involving the formation of a new team to help humanity ensure AI systems much smarter than humans follow human intent -- and thus avoid an AI doomsday scenario popularized in many movies and books.

"Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world's most important problems," the company said. "But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction."

No techniques currently exist to steer or control a potentially superintelligent AI and keep it from going rogue because all existing AI alignment techniques rely on humans being able to supervise AI, which experts believe they can't reliably do when dealing with AI systems much smarter than their supervisors, today's post maintains.

The big push announcement follows a 2022 post that detailed the company's alignment research to help humans constrain runaway technology. While that post was all about AGI, that acronym has now been replaced by superintelligence.

The goal of the superalignment push is to build a roughly human-level automated alignment researcher, after which the company can use vast amounts of compute to scale its further efforts and iteratively align superintelligence.

"To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline," the company said, fleshing out those concerns thusly:

To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can't supervise (generalization).
To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability).
Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).

The company said it's assembling a team of top machine learning researchers and engineers -- along with dedicating 20 percent of its compute resources over the next four years -- to solve the problem of superintelligence alignment. It's also recruiting new help for the effort, specifically research engineers, scientists and managers.

"While this is an incredibly ambitious goal and we're not guaranteed to succeed, we are optimistic that a focused, concerted effort can solve this problem," the company said today. "Solving the problem includes providing evidence and arguments that convince the machine learning and safety community that it has been solved. If we fail to have a very high level of confidence in our solutions, we hope our findings let us and the community plan appropriately. There are many ideas that have shown promise in preliminary experiments, we have increasingly useful metrics for progress, and we can use today's models to study many of these problems empirically."

The big push is currently being discussed on sites including Hacker News, where the top comment is: "From a layman's perspective when it comes to cutting edge AI, I can't help but be a bit turned off by some of the copy. It seems it goes out of its way to use purposefully exhuberant language as a way to make the risks seem even more significant, just so as an offshoot it implies that the technology being worked on is so advanced. I'm trying to understand why it rubs me particularly the wrong way here, when, frankly, it is just about the norm anywhere else? (see tesla with FSD, etc.)"

OpenAI, the company most responsible for the new wave of AI existential threat warnings, has perhaps surprisingly been active in warning humanity about those dangers, calling -- along with many other notable industry organizations and figures -- for AI regulation.

"In terms of both potential upsides and downsides, superintelligence will be more powerful than other technologies humanity has had to contend with in the past," OpenAI execs Sam Altman, Greg Brockman and Ilya Sutskever said earlier this year. "We can have a dramatically more prosperous future; but we have to manage risk to get there. Given the possibility of existential risk, we can't just be reactive. Nuclear energy is a commonly used historical example of a technology with this property; synthetic biology is another example. We must mitigate the risks of today's AI technology too, but superintelligence will require special treatment and coordination."

Today's announcement fills in the details of that special treatment and coordination.

About the Author

David Ramel is an editor and writer at Converge 360.