AI Alignment Problem?
Artificial Intelligence (AI) may be the greatest innovation since the invention of fire or the wheel, according to recent talk. It may also be our last invention.
I first became aware of the amazing potential of Artificial Intelligence as a teen, ironically, when I read a book by Berkeley philosophy professor Hubert Dreyfus called What Computers Can’t Do: The Limits of Artificial Intelligence. He did not believe that humans had some supernatural soul. And he was quite aware of how creative computers could be. But he believed that there were aspects of being human that computers lack.
Not long after reading that book, I had the privilege of learning from MIT Computer Science professor Joseph Weizenbaum. I had read his book Computer Power and Human Reason: From Judgment to Calculation, and he talked to us about his message. He had written a program called “Eliza,” named after Eliza Doolittle in Pygmalion. Eliza could emulate a Rogerian psychotherapist. She would ask questions and respond in a way that could draw a subject toward insights about themselves.
He wrote it as an exercise in creating what we might call the first chatbot. He was horrified when he discovered that people really wanted to use it as a therapist. He explained to us that Eliza did not actually understand anything you are saying and she had no human experiences that could make her empathize with an actual human. That horror led him to write the book. He warned of the danger of “instrumental reason” replacing human interactions.
The current generation of chatbots like ChatGPT and Google Bard are orders of magnitude more sophisticated than Eliza. They are capable of writing essays on almost any subject, from technical to literary and poetic. They currently have flaws – like “hallucinating” references that they totally make up, and making basic arithmetic mistakes. But it is naive to think those problems will persist.
We need to imagine what society will look like in a world where machines can write an essay like this one, but in a fraction of a second. Drawing on a universe of information far beyond what one person can experience in a lifetime.
Tentatively, people in the field are suggesting this technology can be used by a writer as a starting point for creating an essay that expresses the writer’s true meaning. I am skeptical it will end there. I love the opportunity to express my insights and observations here. But it is very possible that a chatbot could offer even more interesting insights than I can very soon.
Programs already have proved mathematical theorems in creative new ways. Other programs can create works of art in the style of any known artist, or in a totally original style. It is not clear that we can just shut all of this down because these are things computers “shouldn’t do.”
In The Hitchhiker’s Guide to the Galaxy, super intelligent “Deep Thought” is asked to provide the answer to “life, the universe and everything.” After millennia of processing, it comes up with “42.” Which left the question: What question is this the answer to?
Philosopher Nick Bostrom illustrated the AI Alignment Problem with a thought experiment. A person might ask a super intelligent machine to create a paperclip factory. The machine might take this to mean that humans want to make as many paperclips as possible. It might take the directive to such an extreme that it starts taking over all of the Earth’s resources to make paperclips. It might kill all humans in order to extract paperclip bits from their bodies – and to keep the humans from interfering with this central goal of maximizing the number of paperclips.
His example highlights the idea that machines are tools with no actual feelings or goals of their own. The machine may be very intelligent and effective at achieving what it thinks is our goal. But it may become very badly out of alignment with our actual goals.
Current AI has no actual point of view. It can be creative, yet it has nothing to express. Chatbots use large language models that predict “what word is likely to come next.” They have zero understanding of what the words mean.
Transhumanists believe that we should not be limited in our future vision to what humans want and need. They imagine a future of intelligences that far exceed our imagination. I am sympathetic to this view. But what if there is “nobody home” in these super intelligences?