The first meeting on artificial intelligence did not go all that well. In summer 1956, on the back of early successes in computer design, leading lights of the new science of information theory met at Dartmouth College in New Hampshire to thrash out a plan for emulating the thought processes of the human mind. The organisers thought it would just take a summer school to formulate a coherent programme of research to ultimately create a human-level intelligence.
According to John McCarthy, then a mathematical professor at the college and an organiser of the event, people drifted in and out of the summer meetings and the group could not agree on how they would proceed. Some backed the idea of emulating neurons in the brain; others thought intelligence could be achieved using symbolic maths. Despite the splits, there was plenty of confidence.
By the 1970s, the early enthusiasm was dissipating. A survey of 70 computer scientists working in AI-related fields found just over half believed it could take 50 years to get to human-like intelligence. Another third thought it would take a lot longer. And as we reach that deadline it certainly looks that way. We can make robots that look close to humans, but machines that approach the thinking skills of David in the Steven Spielberg movie ‘A.I. Artificial Intelligence’ (2001) remain science-fiction.
Early in 2019, AI looked for a moment to be close enough to passing the Turing test to make even those predictions unnecessarily pessimistic. IBM’s Project Debater managed a draw in a head-to-head with 2012 European debate champion Harish Natarajan.
“It was able to state more empirical evidence than I, given I only had 15 minutes to prepare, would ever be able to do. I think we worked out how to defeat that advantage. I could accept the truth of the arguments it presented but give a more nuanced response. That was a level of subtlety that the machine was not able to respond to,” Natarajan explained in an interview after the event with The Hindu.
Similarly, when OpenAI unveiled its enormous GPT-3 deep neural network (DNN) in May 2020 – a machine based on similar techniques to those used by IBM’s debater – users given early access were struck by the apparent coherence of the text it spat out in response to questions and requests. The company’s engineers pointed out in the paper that accompanied the launch that the machine seemed able to handle simple arithmetic. The company went as far as touting the software as part of a collection of “pre-AGI technologies”, where AGI is ‘artificial general intelligence’ (see ‘Definitions’ box below).
AGI: AGI stands for artificial general intelligence, and is the hypothetical ability of an intelligent agent to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science-fiction and future studies. David from the 2001 film ‘A.I.’ is a good example of this.
The Turing Test: The Turing test, originally called the imitation game by Alan Turing in 1950, is a method of inquiry in artificial intelligence (AI) for determining whether a computer can think like a human being.
Turing proposed that a computer can be said to possess AI if it can mimic human responses under specific conditions. The original Turing Test requires three terminals, each of which is physically separated from the other two. One terminal is operated by a computer, while the other two are operated by humans.
During the test, one human functions as the questioner, while the second human and the computer function are respondents. The questioner interrogates the respondents within a specific subject area, using a specified format and context. After a pre-set length of time, or number of questions, the questioner is then asked to decide which respondent was human and which was a computer.
The test is repeated many times. If the questioner makes the correct determination in half of the test runs or fewer, the computer is considered as having artificial intelligence, because the questioner regards it as “just as human” as the human respondent.
Though it puts up stiff resistance in a Turing test, one easy way to trip up a model like GPT-3 is not to ask it why it won’t help a turtle struggling on its back but home in on its maths skills. Earlier this year, Dan Hendrycks and colleagues at the University of California at Berkeley described how even training the models on problems from school mathematics competitions only gets them so far. On comparatively easy problems, the best results from GPT-3 and similar DNNs got fewer than two out of ten answers right and scored less than 6 per cent on the full set of tests. Humans do not necessarily do well either. But a computer science graduate with comparatively little interest in mathematics scored 40 per cent.
Another simple trick is to ‘misprime’ the DNN, which involves feeding it some irrelevant information as part of the question or problem statement that it then either regurgitates as part of the answer or connects to some other information that has no relevance to the core request.
Both mispriming and arithmetic provide clues as to why language models can do so well and so badly at the same time. Their facility with language rests on the discovery more than a decade ago that it was possible to rely almost entirely on statistics to handle tasks like translation. One part of that statistical processing is a technique called word embedding, which encodes words as numeric vectors, each one pointing to a position somewhere in a space that might have thousands of dimensions. Despite the potential immensity of such a space, words with similar meanings tend to lie close to each other.
The connection between meaning and position made it relatively easy to convert from one written language to another. Similar words cluster in the same regions no matter whether they are written in English, French, or Mandarin. Language models use neural-network structures called Transformers to learn the various ways words and phrases in the vector space can connect to each other by ingesting text from gigabytes of electronic documents. This in turn leads to impressive-seeming debating and essay-writing skills but limited arithmetic ability, which mostly crops up because a phrase like ‘2+2=4’ can appear multiple times in the books and online texts used to train it. There is also a chance, because of how it learns, a machine like this will answer ‘5’: it just does not turn up often enough to flip the probability.
In his experience with Project Debater, Natarajan pointed to an apparent lack of common sense in its approach: the computer was unable to sort arguments in a way that made the most sense to the audience. Various groups are working on the assumption that a common cause for the mistakes made by language models lies in the fact they do not connect words and phrases to concepts but simply take advantage of patterns in text. That had led teams around the world to try to get Transformer-based models to learn some common sense, bringing in some help from what has been the forgotten half of AI research for the past decade: the field of symbolic AI and hand-built models.
In the mid-1980s, Doug Lenat, principal scientist at the US research consortium Microelectronics and Computer Technology Corporation, created the Cyc project not just as a way of supporting AGI but to fend off what was then seen by US companies as a major threat from Japan’s fifth-generation computing programme, which fizzled out by the end of that decade.
By encoding the information that rain contains water, water is wet and crops are grown outside, a symbolic-AI-fed Cyc’s database could infer that a field of wheat would be wet after a shower without having to learn the concept explicitly, though it might have to search an entire database of connections to get there. Users found Cyc’s syntax to be difficult to work with. In 1999, MIT researchers decided a more practical approach to building a common-sense database through crowdsourcing would be to use sentences that are recognisably English: ‘cat IsA mammal.’ Having amassed more than a million facts in its semantic database, ConceptNet has outlasted the original project and lives on at open-source repository Github. It is beginning to make its way back into mainstream AI research along with a collection of derivatives that home in on different tasks, such as awareness of time or of emotional responses. In turn, their English-like syntax is being used to teach Transformers.
In one experiment, a team led by Yejin Choi, senior researcher at the Allen Institute for AI in Seattle, found a model based on Transformers taught on both text and a knowledge base could generate new ConceptNet-like connections. The original text-only model would come up with ideas like ‘dove SymbolOf submarine’, apparently a consequence of confusing two meanings of ‘dove’. The augmented version came up with the more familiar ‘dove SymbolOf purity’.
Unfortunately, the reasoning only goes so far even in these hybrids. Chadi Helwe and colleagues at the Institut Polytechnic de Paris analysed the performance of a number of these hybrids and found the reasoning ability of these augmented DNNs remains shallow and still prone to obvious mistakes.
Whatever AGI needs, it is unlikely to get there based on Transformers, even with augmentations. Yann LeCun, Facebook’s chief AI scientist, concluded in a post on Facebook in 2020: “Trying to build intelligent machines by scaling up language models is like building high-altitude airplanes to go to the moon.”
To describe the change needed in approach, researchers such as Choi and University of Montreal professor Yoshua Bengio point to Daniel Kahneman’s book ‘Thinking Fast and Slow’ as a major inspiration. To function as AGIs, machines need to be able to follow chains of reasoning according to the slow System 2 used by humans. Bengio says existing DNNs are effectively operating at the instinctual level of System 1. Choi argues the abilities of DNNs may not yet even extend fully to System 1. Kahneman defines a third lower level: Perception. This is where the abilities of many of today’s AIs seem to lie, not least because some of the convolutional networks that have proved so good at decoding the content of images were based on neuroscientists’ models of a mammalian visual cortex.
One possible direction is to follow the human brain’s lead even further and employ modular architectures that run different AI models in parallel to simulate the ability to follow chains of reasoning. The big question is how this modularity might work. A debate in late December 2019 between Bengio and New York University psychology professor Gary Marcus emphasised this divide.
In a number of papers and at debates like this, Marcus has argued that symbolic or “good old-fashioned AI” as Bengio terms it needs to play a bigger role in the development of AGI. Marcus is not alone. The symbolic-heavy approach is one also being taken by the OpenCog and Hyperon projects, overseen by two-decade veteran of AGI research Ben Goertzel.
Bengio, as well as researchers at DeepMind and OpenAI, take the view that the future is more likely to rely on deep learning, or at least whatever develops from the enormous amount of research effort that is being poured into artificial neurons.
A key issue in any hybrid is linking the domains effectively. So far, though it has its drawbacks, teaching Transformers has proved to be more straightforward than trying to get DNNs and symbolic engines to communicate with each other.
One problem lies in the nature of big knowledge graphs like Cyc or ConceptNet. It is hard for machines to determine the most useful paths through the forest of links. They also form hard links: either things are connected, or they are not. However, projects like OpenCog aim to overcome this by giving the connections between concepts weights, analogous to those used by neural networks.
Proponents of the deep learning path to AGI are working on the basis that the mathematics they are employing delivers major advantages that may push the balance over into doing most or all of the work using the vector embeddings common to today’s DNNs. In a 2016 lecture, University of Toronto professor Geoffrey Hinton, a key figure in the development of deep learning, argued that the brain provides an important demonstration of the power of vector embeddings, such as those used in Transformer-like networks. “The only places you find symbols with us are in the input and at the output. Inside it’s just all big vectors of activity: that’s all there is. Those big vectors of activity are thoughts.”
For the deep learning proponents, the use of mathematical differentiation makes the processes of learning and inference far easier than is possible using symbolic techniques. The trick is to find neural structures that lend themselves to symbolic-like reasoning. Though Transformers as they exist today will not scale up to AGI, their distinguishing property may prove vital if the hunch of researchers such as Bengio and others is correct.
Jürgen Schmidhuber, co-director of the Dalle Molle Institute for Artificial Intelligence Research, the developer of a forerunner to the Transformer structure during the neural network winter of the 1990s, argues that an important aspect of this type of structure is that it separates a neural network into two parts. One learns slowly using the backpropagation that is the trademark of deep learning but that is able to combine its outputs with data to rapidly reprogram the weights of a second, more flexible network. In a Transformer, the DNN connects words in a sentence that are far from each other by quickly altering the attention it gives to each of them.
Some answers may come from the relatively young field of graph neural networks, today used in applications ranging from screening chemicals for use as drugs to fake-news detection in social networks.
Today, these networks are nowhere near as deep as the DNNs used for handling images, words, and speech but they do offer an intuitive connection to symbolic knowledge bases. A major problem with these structures is that trying to learn long-distance connections does not improve performance: the networks form too many links and start to lose instead of gaining useful knowledge.
The differentiable approach offers one further attraction, though it may prove simply too computationally intensive to be workable: get computers to ‘optimise’ their own way to an AGI by trying out different structures, benchmarking them and using that data to learn new structures.
This is the approach promoted by OpenAI chief scientist Jeff Clune though in his 2019 paper, published on the open-access site arXiv. While expounding the virtues of automated AGI, he was careful not to rule out manually created implementations achieving success first. Clune takes inspiration from successes in DNN design, pointing to experiments where letting a computer program explore different options to benchmark them delivered more efficient models than hand-crafted counterparts.
It’s a big ask to get straight to AGI from a computerised optimiser, but work on simulating much smaller, simpler systems may provide a way to gradually scale-up performance and demonstrate a way to learn using far less data than that used by today’s DNNs.
Roboticists are working on emulations of insect and similar simple brains and nervous systems, while in the virtual space researchers such as Agrim Gupta at Stanford University are using evolutionary techniques to get reinforcement-learning systems to walk and hunt targets. These kinds of projects could deliver more efficient AGI but may prove challenging to scale up.
One advantage more conventional deep learning has is that it can deliver usable applications on the way to AGI, assuming that it does not turn into a blind alley.
We may find that the first AGI does not appear as the result of a major break with past architectures but creeps up on us slowly and come armed with encyclopaedic knowledge. And it may be an ugly Goliath of different bits of AI technology baked into a network of supercomputers, rather than a David.
David, the main character of Steven Spielberg’s 2001 film ‘A.I. Artificial Intelligence’, is a child-like android mecha uniquely programmed with the ability for love, something we have been trying to achieve in robots for decades.
David is given to Henry Swinton and his wife Monica, whose son Martin, after contracting a rare disease, has been placed in suspended animation and is not expected to recover. Monica feels uneasy with David, but eventually warms to him and activates his imprinting protocol, causing him to have an enduring child-like love for her.
Martin is unexpectedly cured of his disease and brought home, and becomes jealous of David. He tries to convince his mother to return David to his creators to be destroyed. On the way there, Monica has a change of heart and spares David from destruction by leaving him in the woods.
With Teddy, Martin’s robotic teddy bear, as his only companion, David recalls ‘The Adventures of Pinocchio’ and decides to find the Blue Fairy so that she may turn him into a real boy, which he believes will win back Monica’s love.