Dangerously fast in the wrong direction
Creating super-intelligent KI will be humanity's greatest achievement. But will it be the best or worst thing we've created?
Ki-generated illustration from Midjourney
Main moments
Those who create artificial intelligence (AI) have a dream to create systems that are much smarter than us. And according to Berkeley professor Stuart Russell, we're sitting on a high-speed train heading in a crazy direction. Russell is one of the foremost KI researchers in the world and has written the leading textbook in KI. In Oslo this week, he challenged Prime Minister Støre to take seriously the threats from KI and rethink how KI must be regulated if we are to have any hope of a bright future.
Russell became known to the general public when he signed a petition in March 2023 in which he and a number of other leading scientists advocated that “reducing the risk of annihilation from KI should be a global priority on par with other societal risks such as pandemics and nuclear war.”
Russell's logic is simple. The smartest groups tend to dominate. As we have outmaneuvered and subjugated other species, a super-intelligent AI can outmaneuver and subjugate humans.
Superintelligent KI can outmaneuver and subjugate humans.
Some, like Facebook's KI chief Yann LeCun, believe KI systems can't possibly dominate humans. Dominance comes from an evolutionary instinct imprinted in humans and AI systems do not have such a background.
But the reason wild nature is chopped down and animals treated as industrial products is not ill will, but that we ignore the interests of others when they conflict with our own. Likewise, it can go wrong if super-intelligent AI systems have goals that conflict with our own, even if they have no lust for power.
It's not hard to see how it could go wrong. The relatively stupid KI systems that control the flow of content on social media have already created a lot of damage, including in the form of addiction, extremism and health problems among young girls.
Or say someone attempts to train a KI to reduce as much suffering as possible. AI systems are trained to achieve their goals as efficiently as possible. The easiest way to reduce as much suffering as possible is to wipe out humans and all other living beings. Without sentient beings, no suffering. That's how KI would be dominated as a result of a goal conflict, not ill will.
Nor is it simply turning off super-intelligent systems when they turn out to act in conflict with our interests. Continued function is necessary to achieve the Ki's goal and will be a sub-goal the Ki will reach, with its beak and claws.
That our interests collide with the objectives of the KI system is known as the “alignment problem.” A Norwegian translation could be the harmony problem. Two individuals are in harmony with each other insofar as their goals are compatible with each other. If we can train KI systems to adopt our values, we will have created harmonious KI. Such a system will not be a threat, even if they are smarter than us.
Unfortunately, we are far from understanding how to make harmonious KI. Russell believes the current KI paradigm will never make it. There are three reasons why this is so:
- Specifying with any degree of certainty what is humanity's goal presupposes a solution to one of philosophy's most difficult problems: What is good?
- Even if we were to be able to clarify what values the KIC should be guided by, it is difficult to move from theory to practice. Russell calls this the King Midas problem, after the king who wanted everything he pointed to become gold, but who did not anticipate that it would lead to a gilded but lonely existence.
- Even if the KI behaves exactly as we want during the training and testing, we can get big surprises as a result of the fact that it acquired different goals during training than we thought.
Russell proposes an alternative KI paradigm. Rather than training KIs to become maximally accurate and obliquely confident, we should train them to be benevolent and humble helpers who are insecure and gentle. If the KI knows it's fallible, it would rather check it a second time. If it is gentle it will do less damage if it first steps wrong.
For such a paradigm to emerge, it requires both technological advances and regulatory innovations. It does not mean that we have technology-neutral legislation, as Minister for Digitization Karianne Tung is in the habit of doing. If AI systems that cause incurable harm are released onto the market, it doesn't hold to say that we have laws against harming others.
Within nuclear power and medicine, we have technology-specific laws that require a higher standard of safety. We do not assume that products that are not proven unsafe are safe. Developers, on the other hand, have to prove that the products they create are safe. Russell believes we need a similar piece of legislation for KI.
Despite its appeal, there is much one can defer to Russell's specific proposals. How is the KI supposed to deal with people's internal goal conflicts? Should a KI attempt to stop a person who on Monday said they won't drink but is about to have their third beer on Friday? It's not as simple as saying that a KI should listen to whatever the person ideally wants if the person has inconsistent desires.
Russell doesn't escape the harmony problem either, only this time between people. If KIs are trained to always please the human who owns them, a KI will be as willing helper of terrorist organizations as kindergartens. We cannot allow that.
One solution to this problem is to require KICs to take inward the interests of all people. But a KI that is as service-minded to its neighbour as its owner is unlikely to be popular on the market. In addition, if the Ki is to satisfy all concerned parties, it is difficult to see that the Ki can be a benevolent and humble helper. Rather, it will become a utilitarian overseer.
A better model is to require that AI systems that promote the owner's needs do so in a way that is both consistent with laws and regulations and public morality. We never achieve full harmony. But perhaps today's social order shows that what we need are rules and norms that ensure a degree of harmony, not a perfect balance of people's various interests. Whether that's enough for super-intelligent actors, however, I'm not so sure.