2. State of the Art: Stepwise Incremental AI
In our view, the best way to describe the developmental process of AI so far is as a stepwise incremental progress, and using an analogy from physics, AI “percolates” into our lives. It could indeed make a phase transition into a higher level of complexity above our own, a process we will shortly discuss. But first we want to describe the ongoing process of stepwise incremental AI.
In January 2017 McKinsey [
5] published a comprehensive report that maps the progress of artificial intelligence in a variety of areas. The five areas, described herein, are further broken down into other sub-tasks and sub-capabilities:
Sensory perception—“This includes visual perception, tactile sensing, and auditory sensing, and involves complex external perception through integrating and analyzing data from various sensors in the physical world.” [
5] (p. 34). In this area the performance level is median compared to humans.
Take machine vision as an example. The creation and assimilation of visual capabilities that surpass human vision in cameras, have been relatively easy. However, the more complex part was to add AI technology to the cameras. One such project is Landing.ai (
https://www.landing.ai/) [
6], formed by Andrew Ng, a globally recognized leader in AI. This startup focuses on solving manufacturing problems such as Quality Control (QC). It offers machine-vision tools to detect microscopic defects that may be found in products such as circuit boards, and which the human eye simply cannot detect.
Another recent and interesting project deals with machine touch. In the paper “Learning Dexterous In-Hand Manipulation” [
7], a team of researchers and engineers at OpenAI have demonstrated that in-hand manipulation skills learned with reinforcement learning in a simulator can evolve into a fairly high level of dexterity of the robotic hand. As they explain: “This is possible due to extensive randomizations of the simulator, large-scale distributed training infrastructure, policies with memory, and a choice of sensing modalities which can be modelled in the simulator.” (p. 15). The researchers’ method “did not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity”. (p. 1).
Cognitive capabilities—“A range of capabilities is included in this category including recognizing known patterns and categories (other than through sensory perception); creating and recognizing novel patterns and categories; logical reasoning and problem solving using contextual information and increasingly complex input variables; optimization and planning to achieve specific objectives given various constraints; creating diverse and novel ideas or a novel combination of ideas; information retrieval, which involves searching and retrieving information from a large range of sources; coordination with multiple agents, which involves interacting with other machines and with humans to coordinate group activity; and output articulation and presentation, which involves delivering outputs other than through natural language. These could be automated production of pictures, diagrams, graphs, or mixed media presentations.” [
5] (p. 34).
By using these capabilities, AI can amplify our own abilities: “Artificial intelligence can boost our analytic and decision-making abilities by providing the right information at the right time. But it can also heighten creativity”. [
8] (Para. 13). Consider for example Autodesk’s Dreamcatcher AI which enhances the imagination of designers. As explained in the company’s website:
“Dreamcatcher is a generative design system that enables designers to craft a definition of their design problem through goals and constraints. This information is used to synthesize alternative solutions that meet the objectives. Designers are able to explore trade-offs between many alternative approaches and select design solutions for manufacture.” (
https://autodeskresearch.com/projects/dreamcatcher) [
9].
Some of the cognitive capabilities have achieved human level performance “such as recognizing simple/complex known patterns and categories other than sensory perception; Search and retrieve information from a large scale of sources—breadth, depth, and degree of integration.” [
5] (p. 35). However, other capabilities are currently below median performance such as create and recognize new patterns/categories; solve problems in an organized way using contextual information and increasingly complex input variables other than optimization and planning; create diverse and novel ideas, or novel combinations of ideas [
5].
Natural language processing—“This consists of two distinct parts: natural language generation, which is the ability to deliver spoken messages, including with nuanced human interaction and gestures, and natural language understanding, which is the comprehension of language and nuanced linguistic communication in all its rich complexity.” [
5] (p. 34). As for Natural language generation, although there is progress in this area (such as Google duplex), the levels of performance according to the report are at best median. When it comes to Natural Language understanding, there is a long way ahead of us.
Yet, an example for an effective implementation of these capabilities (and more) is Aida (
http://aidatech.io/) [
10], a virtual assistant that is being used by SEB, a major Swedish bank. Aida interacts with masses of customers through natural-language conversations, and therefore has access to vast amounts of data. “This way she can answer many frequently asked questions, such as how to open an account or make cross-border payments. She can also ask callers follow-up questions to solve their problems, and she’s able to analyze a caller’s tone of voice and use that information to provide better service later.” [
8] (para. 16).
Physical capabilities—This includes gross motor skills, navigation (these two have reached human level performance), fine motor skills and mobility (these are more difficult and hence the performance levels are currently still median and below). “These capabilities could be implemented by robots or other machines manipulating objects with dexterity and sensitivity, moving objects with multidimensional motor skills, autonomously navigating in various environments and moving within and across various environments and terrain.” [
5] (p. 35).
While AIs like Cortana are essentially digital entities, there are other applications where “intelligence is embodied in a robot that
augments a human worker. With their sophisticated sensors, motors, and actuators, AI-enabled machines can now recognize people and objects and work safely alongside humans in factories, warehouses, and laboratories.” [
8] (para. 16, our emphasis).
“Cobots” are probably the best example here. Collaborative robots, as Gonzalez [
11] explains, “excel because they can function in areas of work previously occupied only by their human counterparts. They are designed with inherent safety features like force feedback and collision detection, making them safe to work right next to human operators.” (para. 2).
Based on a white paper that Universal Robots—one of the leading companies in the robot market—has published [
12], Gonzalez lists the seven most common applications for Cobots. One of them, for example, is “pick and place”: “A pick and place task is any in which a workpiece is picked up and placed in a different location. This could mean a packaging function or a sort function from a tray or conveyor; the later [sic] often requires advanced vision systems.” [
11] (para. 3).
Social and emotional capabilities—“This consists of three types of capability: social and emotional
sensing, which involves identifying a person’s social and emotional state; social and emotional
reasoning, which entails accurately drawing conclusions based on a person’s social and emotional state, and determining an appropriate response; and social and emotional
action, which is the production of an appropriate social or emotional response, both in words and through body language.” [
5] (p. 34).
Let us Consider Mattersight as an example. The company provides a highly sophisticated data analysis system that tracks customer’ responses on the telephone. The software analyzes varied communicative micro-features such as, tone, volume, word choice, pauses, and so on. Then, in a matter of a few seconds, AI algorithms interpret these features, compare them to the company’s databases, and come up with a personality profile for each customer. Based on this profile, the customer will be referred to the most appropriate service agent for him [
13].
To sum-up the report, Manyika et al. [
5] notes that from a mechanical point of view, they are fairly certain that perfection can be achieved. Because, already today, through deep reinforcement learning for example, robots can untie shoelaces and remove a nail from the back of a hammer. However, from the cognitive point of view, although the robot’s “intelligence”, has progressed, this is still where the greatest technical challenge lie:
“While machines can be trained to perform a range of cognitive tasks, they remain limited. They are not yet good at putting knowledge into context, let alone improvising. They have little of the common sense that is the essence of human experience and emotion. They struggle to operate without a pre-defined methodology. They are far more literal than people, and poor at picking up social or emotional cues. They generally cannot detect whether a customer is upset at a hospital bill or a death in the family, and for now, they cannot answer “What do you think about the people in this photograph?” or other open-ended questions. They can tell jokes without really understanding them. They don’t feel humiliation, fear, pride, anger, or happiness. They also struggle with disambiguation, unsure whether a mention of the word “mercury” refers to a planet, a metal, or the winged god of Roman mythology. Moreover, while machines can replicate individual performance capabilities such as fine motor skills or navigation, much work remains to be done integrating these different capabilities into holistic solutions where everything works together seamlessly.” (pp. 26–27).
3. AI Goes Hand in Hand with Our Understanding of Ourselves
Singularity is based on several assumptions: first, that there is a clear notion of what is human intelligence; and second, that AI can decrease the gap between human intelligence and machine intelligence. However, both of these assumptions are not clear yet. What is becoming more and more apparent is that AI goes hand in hand with our understanding of our own human intelligence and behavior.
“Intelligence” is a complex and multifaceted phenomenon that has for years interested researchers from countless fields of study. Among others, intelligence is studied from psychological, biological, economical, statistical, engineering, and neurological perspectives. New insights emerge over time from the various disciplines, many of which are adopted into the science of AI and contribute to its development and progress. The most striking example is the special and fruitful interrelationship between artificial intelligence and cognitive science.
Cognitive science and artificial intelligence arose at about the same time, in the late 1950s, and grew out of two main developments: “(1) the invention of computers and the attempts soon thereafter to design programs that could do the kinds of tasks that humans do, and (2) the development of information-processing psychology, later called cognitive psychology, which attempted to specify the internal processing involved in perception, memory, and thought. Cognitive science was a synthesis of the two, concerned both with the details of human cognitive processing and with the computational modeling of those processes.” [
14] (p. 1).
What AI does best is
analyze, categorize, and find the relationships between large amounts of data, quickly and very effectively, coming up with highly accurate predictions. These capabilities, as Collins and Smith explained [
14], were the outcome of three foci that have turned out to be three major bases for progress in AI:
formalisms—such as mean-ends analysis, which are standard methods for representing and implementing cognitive processes;
tools or languages for building intelligent programs such as John McCarthy’s LISP [
15]; and
programs—beginning with the Dendral project [
16], the first expert system to allow the formation of a scientific hypothesis.
“By contrast, psychology historically has made progress mainly by accumulating empirical phenomena and data, with far less emphasis on theorizing of the sort found in artificial intelligence. More specifically, psychological theories have tended to be constructed just to explain data in some experimental paradigm, and have tended to be lacking a well-founded mechanism, probably a relic of the behavioristic or stimulus-response approach that dominated psychology from the 1920s through the 1950s.” [
14] (p. 2).
At first, AI was mistakenly identified with the mechanical psychological viewpoint of behaviorism. The physicalism of stimulus and response looked similar to the action of computers, a reduction of man into its gears [
17]. In psychology, a more ‘humanistic’ view of the science was demanded. It was agreed by all ‘humanistic’ psychologists that a good theory should be irreducible, its terms cannot be reduced to simple physical constituents, and that terms such as ‘intention’ should have a major part in the theory. Moreover, any action should have some meaning to the actor, and the meaning should be subjective. The ‘humanistic’ approach to psychology was a scientific revolution against positivistic psychology (in the Kuhnian sense) [
17] (p. 396). It turned out that AI came to be very similar to the ‘humanistic’ viewpoint. Both AI and cognitive science were beginning to ask similar questions and to use many similar terms.
What was needed for a science of cognition was a much richer notion of knowledge representation and process mechanisms; and that is what artificial intelligence has provided. Cognitive psychologists gained a rich set of formalisms to use in characterizing human cognition [
14] (p. 2). Some of the early and most important formalisms were means-ends analysis [
18], Discrimination nets [
19], Semantic networks [
20], Frames and scripts [
21,
22], Production systems [
23], Semantic primitives [
24,
25], Incremental qualitative analysis [
26]. Through the years a wide range of formalisms were developed for analyzing human cognition, and many of them are still in use today.
Moreover, artificial intelligence has become a kind of theoretical psychology. Researchers who sought to develop a psychological theory could become artificial intelligence researchers without making their marks as experimentalists. Thus, as in physics, two branches of psychology were formed—experimental and theoretical—and cognitive science has become the interface where theorists and experimentalists sort things out [
14].
Boden [
17] suggests that AI can be used as a test-lab for cognitive science. It raises and exposes psychological questions that were deeply implicit. It suggests new terms, ideas and questions that were otherwise hidden. In that sense we dare say that computation is playing the role of language for cognitive science. Similar to the role of mathematics in physics, computation has become a language for constructing theories of the mind. Computation is a formal language that imposes a set of constraints on the kind of theories that can be constructed. But unlike mathematics, it has several advantages for constructing psychological theories: while mathematical models are often static, computational models are inherently process-oriented; while mathematical models, particularly in psychology, are content-independent, computational models can be content-dependent; and while computational models are inherently goal-oriented, mathematics is not. [
14].
Questions that we should ask includes; is the use of the same terms and the same language in AI and cognitive sciences only an analogy? Could it imply something deeper? Can we insert true ‘intention’ and true ‘meaning’ into computer agents? How can we define such terms in AI? In fact, this is the main question of strong AI. This would bring AI and cognitive science much closer.
In an attempt to answer these questions, we refer to the viewpoint of Dennett [
27]. Let’s define the notion of ‘meaning’; to put things very simplistically, we will say that an action of a computer agent has a ‘meaning’ (for the agent) if the action is changes some part of its environment and the agent can sense that change. For example, if the agent is a ribosome, then the transcription of an RNA into a series of amino-acids, later to become a protein, has a meaning since the protein has some function in changing the agent’s environment. The action of the ribosome has a ‘meaning’ in the cytoplasm environment. Similarly, we can embed a ‘meaning’ in computer agents. It was suggested by Dennett that we human can insert a derived ‘intention’ in computers, and computers can derive a lower type of ‘intention’ in other computers. This was also brought up years ago by Minsky [
28], using a different language.
It was suggested by Boden [
17] that we can bridge the gap between ‘humanistic’ approach to cognitive science (in the sense discussed above) and physical mechanism. The way to do so is by introducing an inner representation of the self into the computer. Intentionality and meaning could be aimed (given a context) into this inner representation; the reduction or mechanism of the intentionality will be enabled by the design or architecture of the inner representation. Hence, in order to describe what is going on in the computer, the language of intentionality will be the most appropriate, in the same sense that we talk about our dog’s intentions when we wish to describe or explain its behavior, without the need for a behavioristic language, or other physical terms. It will not be ‘natural’ or efficient to describe the action of the computer in the language of the state of its switches, we will say that this particular action was ‘intended for’ to comply with the ‘state of mind’ that the computer had. This sounds a somewhat pretentious goal, however it is based on the assumption that any future advancement in AI must stand on a basic cognitive architecture, much more basic and deeper than what we have today.
Most of the recent progress in AI have been driven by deep neural networks and these are related to the “connectionist” view of human intelligence. Connectionist theories essentially perceive learning—human and artificial—as rooted in interconnected networks of simple units, either real neurons or artificial ones, which detect patterns in large amounts of data. Thus, some in the machine learning field are looking to psychological research on human learning and cognition to help take AI to that next level. Although the concept of neural networks has existed since the 1940s, only today, due to an enormous increase in computing power and the amount and type of data available to analyze, deep neural networks have become increasingly powerful, useful and ubiquitous [
29].
The theory of consciousness was recently investigated by AI researchers. It was suggested by Dennett [
27] that consciousness is an emergent property of many small processes, or agents, each struggle for its homogeneity. Consciousness is not a stage with spotlights in which all subconscious processes are the audience. Consciousness is a dynamical arena where many agents appear and soon disappear. It resembles an evolutionary process occurring in a very short timescale [
30]. On this very basis a few AI models were suggested, the Copycat model [
31] and its more advanced Learning Intelligent Distribution Agent (LIDA) [
32] model. These two are examples of a strong reciprocal interaction between AI and cognitive science.
Similar reciprocal relationships are now beginning to form between social sciences and artificial intelligence to become the field of artificial social intelligence (ASI). ASI is an interdisciplinary science, which was introduced years ago by Brent and others [
33], and is only now becoming prevalent. ASI is a new challenge for social science and a new arena for the science of AI. It deals with the formalization of delicate social interactions, using it in AI to implement social behavior into robots. The prospects for social scientists were suggested years ago by Anderson [
34]:
“It is time for sociology to break its intellectual isolation and participate in the cognitivist rethinking of human action, and to avail itself of theoretical ideas, techniques and tools that have been developed in AI and cognitive science” (p. 20).
“My argument is that sociologists have a great deal to learn from these disciplines, and that the adoption of concepts, methods and tools from them would change sociologists working habits […]” (p. 215).
4. ASI, A New Challenge
While artificial cognitive intelligence has become a well-established and significant field of research, and has been heavily invested by both cognitive and artificial intelligence researchers, artificial social intelligence is in its early stages and has great potential for the advancement of smart machines in a new and essential way.
While cognitive artificial intelligence scientists “essentially view the mind as something associated with a single organism, a single computational system, social psychologists have long recognized that this is just an approximation. In reality the mind is social, it exists, not in isolated individuals, but in individuals embedded in social and cultural systems.” [
35] (p. 24).
It is well established now that there are sets of brain regions that are dedicated to social cognition. It was first shown on primates [
36] and later on humans [
37]. As Frith [
38] explains: “The function of the social brain is to enable us to make predictions during social interactions.” (p. 67). The social brain includes a variety of mechanisms, such as the amygdala which is activated in case of fear. It is also connected with the mechanism of prejudice, stereotyping, associating values with stimuli. It concerns both people—individual and group—and objects. Another such mechanism is the medial prefrontal cortex, which is connected with the understanding of the other’s behavior in terms of its mental state, with long term dispositions and attitudes, and with self-perception about long term attitudes.
From the social point of view, Mead [
39], in his book,
Mind, Self and Society, defines the “social organism” as “a social group of individual organisms” (p. 130), or in modern language, as an emergent phenomenon. This means that each individual, as an organism in itself, is also a part of a larger system, the social organism. Hence, each individual’s act must be understood within the context of some social act that involve other individuals. The social act is therefore viewed as a dynamic and complex system within which the individual is situated. As such, the social ‘organism’ actually defines the individual acts, that is, within it these acts become meaningful.
In his book
Artificial Experts [
40], Collins argues similarly
that intelligence cannot be defined without considering social interactions. This is because “[…] the locus of knowledge appears to be not the individual but the social group; what we are as individuals is but a symptom of the groups in which the irreducible quantum of knowledge is located. Contrary to the usual reductionist model of the social sciences, it is the individual who is made of social groups.” (p. 6).
Our intelligence, as Yudkowsky [
41] clarifies, “includes the ability to model social realities consisting of other humans, and the ability to predict and manipulate the internal reality of the mind.” (p. 389). Another way to put it is through Mead’s concept of the ‘Generalized other’ [
39]. As Dodds, Lawrence & Valsiner [
42] explain, “
to take the role of the other involves the importation of the social into the personal, and this activity is crucial for the development of self-consciousness and the ability to operate in the social world. It describes how perspectives, attitudes and roles of a group are incorporated into the individual’s own thinking in a way that is distinct from the transmission of social rules, and in a way that can account for the possibility of change in both person and society.” (p. 495, our emphasis).
Hence, as Collins [
40] argues, “The organism into which the intelligent computer supposed to fit is not a human being but a much larger organism; a social group. The intelligent computer is meant to counterfeit the performance of a whole human being within a social group, not a human being’s brain.
An artificial intelligence is a ‘social prosthesis’.” (p. 14, our emphasis).
All the above suggests the emergence of a new interdisciplinary discipline. The main concern of this new field of science is the formalization of delicate social modules, using them in AI to implement social awareness (perhaps a type of social common sense understanding) and social behavior into robots. Because of the dynamic nature of social interactions, these ASI systems face difficult challenges, some of which are not even predictable. In order to address these challenges, ASI systems will have to be dynamic by continuously reviewing and evolving their interaction strategies in order to adapt to new social situations. Moreover, it is essential to examine and assess these strategies in as many contexts as possible, in which ongoing, continuous interactions are taking place.
For making ASI come true, there are some fundamental steps which needs to be solved [
43]. Firstly, there is a need to discover the principles of socio-culture interactions in which the ASI system could have a role. In order to formulate those principles there is considerable importance for conducting large data-driven studies aimed at validating these principles, as well as identifying and characterizing new behavioral traits. Such studies are already being conducted, using the enormous amounts of socially grounded user data generated and highly available from social media; as well as the significant advancements in machine learning and the wide variety of data-analysis techniques. One such project is “Mark my words!” [
44]. This project demonstrates the psycholinguistic theory of communication accommodation according to which participants in conversations tend to adapt to the communicative behavior patterns of those with whom they converse. The researches have shown “that the hypothesis of linguistic style accommodation can be confirmed in a real life, large scale dataset of Twitter conversations.” (p. 754). A probabilistic framework was developed, which allowed the researchers to measure “accommodation and, importantly, to distinguish effects of style accommodation from those of homophily and topic-accommodation.” [
44].
Once the relevant socio-cultural principles have been extracted and defined, the next step will be to understand how they can be assimilated into ASI systems such as chatbots, recommender systems, autonomous cars, etc. One such system is the
virtual receptionist, “which keeps track of users attention and engagement through visual cues (such as gaze tracking, head orientation etc.) to initiate the interaction at the most appropriate moment [
45]. Further, it can also make use of hesitation (e.g., “hmmm… uhhh”) to attract the attention of the user, buy time for processing or even to indicate uncertainty in the response [
46].” [
43] (para. 6).
ASI systems have no clear definition of goals, there is no specific task the machine is oriented towards. In a sense, the machine’s social behavior is the goal. In other words, it is impossible to defined clear goals in advance, and these may even emerge dynamically. This means that measurement and evaluation methods are very difficult to apply to a socio-cultural intelligence of such a system. This is one of the biggest challenges the ASI field has to deal with.
5. AGI, An Overview, Is It Enough?
An important concept to dwell on is that of artificial general intelligence (AGI). AGI constitute a new step towards strong AI. General intelligence is not a fully well-defined term, but it has a qualitative meaning: “What is meant by AGI is, loosely speaking, AI systems that possess a reasonable degree of self-understanding and autonomous self-control, and have the ability to solve a variety of complex problems in a variety of contexts, and to learn to solve new problems that they didn’t know about at the time of their creation.” [
35] (p. VI).
There is a clear distinction between AGI and narrow AI research. The latter is aimed at creating programs that specialize in performing specific tasks, such as ordering online shopping, playing GO, diagnosing diseases or driving a car. But, despite their great importance and popularity, narrow AIs core problem is that “they are inherently narrow (narrow by design) and fixed. Whatever capabilities they have, are pretty much frozen in time. It is true that narrow AI can be designed to allow for some limited learning or adaptation once deployed, but this is actually quite rare. Typically, in order to change or expand functionality requires either additional programming, or retraining (and testing) with a new dataset.” [
47] (para. 4–5).
Intelligence, in general, “implies an ability to acquire and apply knowledge, and to reason and think, in a variety of domains” [
48] (p. 15). In other words, intelligence in its essence has a large and dynamical spectrum.
“Narrow AI systems cannot adapt dynamically to novel situations—be it new perceptual cues or situations; or new words, phrases, products, business rules, goals, responses, requirements, etc. However, in the real world things change all the time, and intelligence is by definition the ability to effectively deal with change.” [
47] (para. 6).
Artificial general intelligence requires the above characteristics. It must be capable of performing various tasks in different contexts, making generalizations and tapping from existing knowledge in a given context to another. Hence, as Voss [
47] explains, “it must embody at least the following essential abilities:
- (1)
To autonomously and interactively acquire new knowledge and skills, in real time. This includes one-shot learning—i.e., learning something new from a single example.
- (2)
To truly understand language, have meaningful conversation, and be able to reason contextually, logically and abstractly. Moreover, it must be able to explain its conclusions.
- (3)
To remember recent events and interactions (short-term memory), and to understand the context and purpose of actions, including those of other actors (theory of mind).
- (4)
To proactively use existing knowledge and skills to accelerate learning (transfer learning).
- (5)
To generalize existing knowledge by forming abstractions and ontologies (knowledge hierarchies).
- (6)
To dynamically manage multiple, potentially conflicting goals and priorities, and to select the appropriate input stimuli and to focus on relevant tasks (focus and selection).
- (7)
To recognize and appropriately respond to human emotions (have EQ, emotional intelligence), as well as to take its own cognitive states—such as surprise, uncertainty or confusion—into account (introspection).
- (8)
Crucially, to be able to do all of the above with limited knowledge, computational power, and time. For example, when confronted with a new situation in the real world, one cannot afford to wait to re-train a massive neural network over several days on a specialized supercomputer. (para. 12).
In conclusion, general intelligence is a complex phenomenon that emerges from the integration of several essential components. “On the structural side, the system must integrate sense inputs, memory, and actuators, while on the functional side various learning, recognition, recall and action capabilities must operate seamlessly on a wide range of static and dynamic patterns. In addition, these cognitive abilities must be conceptual and contextual—they must be able to generalize knowledge, and interpret it against different backgrounds.” [
49] (p. 147).
From the point of view of strategy and methodology AGI sometimes uses a top down approach on cognition, as Wang and Goertzel [
50] explains, “An AGI project often starts with a blueprint of a whole system, attempting to capture intelligence as a whole. Such a blueprint is often called an “architecture”.” (p. 5).
Cognitive architecture (CA) research “models the main factors participated in our thinking and decision and concentrates on the relationships among them. In computer science, CA mostly refers to the computational model simulating human’s cognitive and behavioral characteristics. Despite a category of loose definition, CAs usually deal with relatively large software systems that have numerous heterogeneous parts and subcomponents. Typically, many of these architectures are built to control artificial agents, which run both in virtual worlds and physical robots.” [
51] (p. 1).
Symbolic systems are one important type of cognitive architecture. “This type of agents maintains a consistent knowledge base by representing the environment as symbols.” [
51] (p. 2). Some of the most ambitious AGI-oriented projects in the history of the field were in the symbolic-AI paradigm. One such famous project is the General Problem Solver [
52], which used heuristic search (means-ends analysis) to solve problems. Another famous effort was the CYC project [
53]. The project’s aim was to create human-like AI by collecting and encoding all human common sense knowledge in first order logic. Alan Newell’s SOAR project [
54,
55] was an attempt to create unified cognition theories, based on “logic-style knowledge representation, mental activity as problem-solving carried out by an assemblage of heuristics, etc.” [
35] (p. 3). However, the system was not constructed to be fully autonomous or to have self-understanding [
35].
These and other early attempts failed to reach their original goals, and in the view of most AI researchers, failed to make dramatic conceptual or practical progress toward their goals. Some (GPS for example) failed because of exponential growth in computational complexity. However, more contemporary AGI studies and projects offer new approaches, combining the previous knowledge—both theories and research methods—accumulated in the field.
One such integrative scheme described by Pennachin and Goertzel [
35], was given the name ‘Novamente’. This scheme involves taking elements from various approaches and creating an integrated and interactive system. However, as the two explain: “This makes sense if you believe that the different AI approaches each capture some aspect of the mind uniquely well. But the integration can be done in many different ways. It is not workable to simply create a modular system with modules embodying different AI paradigms: the different approaches are too different in too many ways. Instead one must create a unified knowledge representation and dynamics framework, and figure out how to manifest the core ideas of the various AI paradigms within the universal framework.” (p. 5).
In their paper, “Novamente: an integrative architecture for Artificial Intelligence” [
56], Goertzel et al. suggest such an integrative AI software system. The Novamente design incorporates evolutionary programming, symbolic logic, agent systems, and probabilistic reasoning. The authors clarify that “in principle, integrative AI could be conducted in two ways: Loose integration, in which different narrow AI techniques reside in separate software processes or software modules, and exchange the results of their analysis with each other. Tight integration, in which multiple narrow AI processes interact in real-time on the same evolving integrative data store, and dynamically affect one another’s parameters and control schemata. Novamente is based on a distributed software architecture, in which a distributed processing framework called DINI (Distributed Integrative Intelligence) is used to bind together databases, information-gathering processes, user interfaces, and “analytical clusters” consisting of tightly-integrated AI processes.” (p. 2).
Novamente is extremely innovative in its overall architecture, which seeks to deal with the difficulty of creating a “whole brain” in a completely new and direct way. The basic principles on which the design of the system is founded are derived from the “psynet model”—an innovative complex-systems theory of mind—which was developed by Goertzel [
57,
58,
59,
60,
61]. “What the psynet model has led us to is not a conventional AI program, nor a conventional multi-agent-system framework. Rather, we are talking about an autonomous, self-organizing, self-evolving AGI system, with its own understanding of the world, and the ability to relate to humans on a “mind-to-mind” rather than a “software-program-to-mind” level.” [
35] (pp. 64–65).
Another interesting project is the Learning Intelligent Distribution Agent (LIDA) [
62]. The LIDA architecture is presented as a working model of cognition, a Cognitive Architecture, which was designed to be consistent with what is known from cognitive sciences and neuroscience. Ramamurthy et al. argue “that such working models are broad in scope and could address real world problems in comparison to experimentally based models which focus on specific pieces of cognition. […] A LIDA based cognitive robot or software agent will be capable of multiple learning mechanisms. With artificial feelings and emotions as primary motivators and learning facilitators, such systems will ‘live’ through a developmental period during which they will learn in multiple ways to act in an effective, human-like manner in complex, dynamic, and unpredictable environments.” (p. 1).
In a nutshell, LIDA is a modified version of the old COPYCAT architecture suggested years ago by Hofstadter [
31]. It is based on the attempt to understand consciousness as a working space for many agents. The agents compete one another and those that dominate the workspace are identified as the ones that constitute our awareness. The process is dynamic, information flows in from the environment, and action is decided by a set of heuristics, which are themselves dynamic.
The LIDA architecture is partly symbolic and partly connectionist; part of the architecture “is composed of entities at a relatively high level of abstraction, such as behaviors, message-type nodes, emotions, etc., and partly of low-level codelets (small pieces of code). LIDA’s primary mechanisms are perception, episodic memory, procedural memory, and action selection.” [
62] (p. 1).
With the design of three continually active incremental learning mechanisms—perceptual learning, episodic learning and procedural learning—the researchers have laid the foundation for a working model of cognition that produces a cognitive architecture capable of human like learning. As the authors [
62] explain:
“The architecture can be applied to control autonomous software agents as well as autonomous robots “living” and acting in a reasonably complex environment. The perceptual learning mechanism allows each agent controlled by the LIDA architecture to be suitably equipped so as to construct its own ontology and representation of its world, be it artificial or real. And then, an agent controlled by the LIDA architecture can also learn from its experiences, via the episodic learning mechanism. Finally, with procedural learning, the agent is capable of learning new ways to accomplish new tasks by creating new actions and action sequences. With feelings and emotions serving as primary motivators and learning facilitators, every action, exogenous and endogenous taken by an agent controlled with the LIDA architecture is self-motivated.” (p. 6).
A third project worth mentioning is Schmidhuber’s Gödel Machines [
63]. Schmidhuber describe these machines as “the first class of mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers. Inspired by Kurt Gödel’s celebrated self-referential formulas (1931), such a problem solver rewrites any part of its own code as soon as it has found a proof that the rewrite is
useful, where the problem-dependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and in an asymptotically optimally efficient way tests computable
proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable self-rewrite.” (p. 1).
In other words, the Gödel machines “are universal problem solving systems that interact with some (partially observable) environment and can in principle modify themselves without essential limits apart from the limits of computability. Their initial algorithm is not hardwired; it can completely rewrite itself, but only if a proof searcher embedded within the initial algorithm can first prove that the rewrite is useful, given a formalized utility function reflecting computation time and expected future success (e.g., rewards).” (p. 2).
A completely different approach to AGI suggests imitating the complex architecture of the human brain and creating its exact digital simulation. However, this method is questionable since the brain has not been fully deciphered yet. Another, more abstract way to create AGI is to follow cognitive psychology research and to emulate the human mind. A third way is to create AGI by emulating properties of both aspects—brain and mind. But, as Wang [
64] stresses, the main issue is not “whether to learn from the human brain/mind (the answer is always “yes”, since it is the best-known form of intelligence), or whether to idealize and simplify the knowledge obtained from the human brain/mind (the answer is also always “yes”, since a computer cannot become identical to the brain in all aspects), but on
where to focus and
how much to abstract and generalize.” (pp. 212–213).
One of the unsolved problems of AGI research is our lack of understanding of the definition of “Generalization”, but what Perez [
65] suggests “is that our measure of intelligence be tied to our measure of social interaction.” (para. 7). Perez calls his new definition for generalization “Conversational Cognition” and as he explains:
“An ecological approach to cognition is based on an autonomous system that learns by interacting with its environment. Generalization in this regard is related to how effectively automation is able to anticipate contextual changes in an environment and perform the required context switches to ensure high predictability. The focus is not just in recognizing chunks of ideas, but also being able to recognize the relationship of these chunks with other chunks. There is an added emphasis on recognizing and predicting the opportunities of change in context.” (para. 11).
The most sophisticated form of generalization that exists demands the need to perform conversations. Moreover, Perez [
65] clarifies that this conversation “is not confined only to an inanimate environment with deterministic behavior. […] we need to explore conversation for computation, autonomy and social dimensions. […] The social environment will likely be the most sophisticated system in that it may demand understanding the nuisances of human behavior. This may include complex behavior such as deception, sarcasm and negotiation.” (para. 13, 14).
Another critical aspect of social survival is the requirement for cooperative behavior. But as Perez [
65] argues, effective prediction of an environment is an insufficient skill to achieve cooperative behavior. The development of language is a fundamental skill, and conversations are the highest reflection of intelligence. “They require the cognitive capabilities of memory, conflict detection and resolution, analogy, generalization and innovation.” (para. 15). But at the same time it is important to keep in mind that languages are not static—they evolve over time with new concepts.
Moreover, Perez [
65] clarifies that “effective conversation requires not only understanding an external party but also the communication of an automaton’s inner model. In other words, this conversation requires the appropriate contextualized communication that anticipates the cognitive capabilities of other conversing entities. Good conversation requires good listening skills as well as the ability to assess the current knowledge of a participant and performing the necessary adjustment to convey information that a participant can relate to.” (para. 16). For Perez, the ability to effectively perform a conversation with the environment is the essence of AGI. Interestingly enough, what most AGI research avoids is the reality that an environment is intrinsically social—i.e., that there exist other intelligences.
As we have argued above, we believe that the next step to take to make human and machine intelligence come closer together, is to focus on the social aspect of human intelligence and on the ways to integrate social behavior in machines.
6. Embodiment
One of the biggest challenges for AI is the challenge of embodied cognition. If AI could surpass this hurdle it will be very close to true human intelligence. Embodied cognition was first presented as the main reason why AI is impossible. We propose to view embodied cognition as a step towards better AI and not as an imposition. Let us make a small detour to the history and philosophy of computation.
Dreyfus [
66] claimed that true AI is impossible since it implicitly assumes that human intelligence is symbolic in its essence. Some AI researchers are attempting to build a context-free machine that manipulates symbols, assuming the human mind works similarly. Dreyfus claimed that the symbolic conjecture is a fault, basing his arguments primarily on philosophical grounds. AI assumes that we have a type of ‘knowledge representation’ in our brain, a representation of the world, this idea is based on Descartes’ theory, and has a long tradition. Moreover, Descartes claimed that there is a duality, a separation between our body and our mind, therefore the mind cannot be embodied. So far, claimed Dreyfus [
66], all AI research is based on these assumptions that we have a model of the world in our mind and that the mind is separated from the body.
Could these assumptions be wrong? Could it be that some part of our intelligence in embedded in our body? We interact with the world with our body, we perceive with our body, we sense with our body. Could it be that symbolic intelligence is not enough?
For Dreyfus [
66], embodiment is enrooted in the deep philosophical grounds of existentialism. Existentialism discusses the notions of involvement and detachment. Most of the time, humans are involved in the world, they interact, they solve practical problems, they are involved in everyday coping, finding their way about in the world. However, when things become difficult, the individual retracts into detachment. For most of the things you do, there is no need for any type of awareness; while climbing stairs you do not think about the next step. If you do you will probably fall. Only when the stairs are too steep, you might consider your next step, and then you retract to a state of detachment. For Heidegger [
67] there is the World where we live, where everything has ‘meaning’, and there is the Universe where detachment and science lives. Science is the outcome of our involvement in the world, and not the other way around. Science cannot explain the ‘meaning’ of things. Existentialism is therefore the opposite of Descartes’ dualism.
Part of our intelligence is therefore somewhere in our bodily interactions with the world. In addition to our senses of sight, smell, hearing etc., we have ‘senses’ of time, space, surroundings, etc. The discovery of neural place cells [
68,
69,
70] emphasizes the embodiment of our sense of space. A good example to illustrate embodiment is the proven connection between movement and intelligence in baby development. Free movement, such as rolling, crawling, sitting, walking, jumping, etc., is associated with the development of the frontal cortex, the area where higher-order thinking occurs [
71].
We can coin the above set of ‘senses’ as ‘environmental intelligence’. The question is how much of our intelligence is grounded in our body, and how much is ‘context free’ in our mind? If we had no body, could we think the same way we think? Could we think at all? Would we have anything to think about? What is the connection between our ‘environmental intelligence’ and our ‘context free symbolic manipulation intelligence’?
Dreyfus [
66] thought that neural network computations are indeed a step in the right direction. It is an attempt to formalize our perceptions in terms of virtual neurons which have some parallel in our brain and body.
There were computer scientists that took Dreyfus’ stand seriously. Brooks [
72] came up with the idea that there is no need for knowledge representation at all: “The key observation is that the world is its own best model” (p. 6).
Brooks’ robots were able to wander around, to avoid obstacles and to perform several basic tasks. A more modern version would be an intelligent swarm, where a set of simple agents interact and can bring about some emergent property [
73].
Brooks and Dennett cooperated in a project involving a humanoid named COG [
74,
75], where they tried to implement the above ideas by letting the COG computer (with a torso, camera eyes, one arm, and three fingers) interact with its environment, trying to learn new things as if it is a newborn. Brooks used a ‘subsumption architecture’ where several simple modules were competing for dominance. A few years later, the science of Embodied Cognition was born [
76,
77] and reached similar conclusions from a different point of view, the point of view of cognitive psychology.
The science of embodied cognition has several basic assumptions. First, humans have a set of atomic and primitive cognitive abilities that are embodied. These abilities build our perceptions of the world. Lakoff and Johnson [
76] used the term ‘image schema’. Such a schema is a small process, which therefore could also be dynamic. Furthermore, any higher feature, any concept that we form, is the result of aggregations of the above primitives. Finally, we think by using Metaphors and Frames and these are formed using associations of schemas.
As for Frames, many words have a large and natural context and cannot be understood without their context, for example prisoner, nurse, doctors, etc. These are the Frames. Frames were suggested in social science by Goffman [
78], and they were also referred to in the context of AI by Minsky [
21]. Minsky was also interested in issues such as: the symbolic aspect of vision, the relation of his theory of Frames to Piaget’s theory of development, language understanding, scenarios etc. Dreyfus [
66], on the other hand, stressed the fact that real frames are infinite in nature and could not be truly described in AI. This was coined the ‘Frame Problem’.
Metaphors are formed using Hebbian learning [
79]. In early childhood we learn to associate between several concepts, such as ‘warm’ and ‘up’, or ‘cold’ and ‘down’. The more these associations of schemas are presented to us in childhood the stronger we relate the schemas. Later we use more elaborated metaphors to reason. We solve real situations by using homeomorphisms into a world of metaphors, solving the imaginary situation first. This is ‘thinking by metaphors’ [
77].
To prove all of the above, embodied cognition scientists search for clues in language where they look for invariants. The existence of such invariants can imply that something deep, common to all languages, underlies. In many examples a word has several meanings, one is environmental and embodied, the other much more abstract. For example ‘to grasp’ is first of all ‘to catch’, however it also has the meaning of ‘to understand’. We ‘see’ things in the sense of understanding, we talk about a ‘warm’ or ‘cold’ person, etc. Old proverbs are a good source for such examples.
Lakoff and Johnson’s [
77] argument is that the more abstract meaning is the secondary one, it is the derived meaning. We derive such meanings from the simpler ones, the embodied and primitive meanings. It is a bottom up process.
Artificial Social Intelligence is also concerned with the environment of the intelligent agent, in particular its social environment. However, there is a difference between the theory of ASI and Embodied Cognition. ASI is rooted in social science, and the notion of a ‘social brain’. Embodied cognition is rooted in the intersection of cognitive sciences and linguistics, it is also based on philosophical grounds. Embodied cognition focuses on universal principles of understandings, or in other words, the cognitive architecture.
Can we follow the way from the embodied primitives upwards to the abstract concepts? Can we use AI to boost such research? This was attempted by several researchers; Regier [
80] used the theory of deep neural networks (and recurrent neural networks) to identify the primitives of cognition used in relation to our sense of space. In deep neural networks we take very simple and primitive features and build upon the next layer of more complex features. This is a multi-step process. This bottom up process happens until a global pattern can be recognized and it resembles the process that was defined by Lakoff and Johnson [
76] for schemas. Regier [
80] was struggling with old methods of back propagation to tune his network. Today we can advance Regier’s idea by using new techniques in Deep Neural Networks.
Another way in which we can implement embodiment cognition is by formalizing the idea of metaphors. To be able to use metaphors we need to enable the computer the capability to simulate a situation in which the machine itself resides. This was already done in the context of value alignment. Winfield, Blum and Liu [
81] defined a ‘consequence machine’ that could simulate a situation, but could also observe itself in that simulation. The machine then had to decide on a ‘moral’ dilemma.
Embodied cognition is a new hurdle to overcome, it is the missing bridge between robotics and AI. It should not be thought of as an imposition on AI but as a new challenge.
7. The Way to Overcome Our Fears: Value Alignment
In an article called “How Do We Align Artificial Intelligence with Human Values?” [
82], Conn explains that “highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation”. (para. 5).
One of the main challenges in aligning AI with values is to understand (and to agree upon) what exactly these values are. There are many factors that must be taken into account which depend mainly on context—cultural, social, socioeconomic and more. It is also important to remember that humanity often does not agree on common values, and even when it does, social values tend to change over time.
Eliezer Yudkowsky offered the first attempt at explaining AI alignment in his seminal work on the topic, “Creating Friendly AI” [
83], and he followed this up with a more nuanced description of alignment in “Coherent Extrapolated Volition” [
84]. Nearly a decade later Stuart Russell began talking about the value alignment problem, giving AI alignment its name and motivating a broader interest in AI safety. Since then numerous researchers and organizations have worked on AI alignment to give a better understanding of the problem.
As Tegmark [
85] explains: “aligning machine goals with our own involves three unsolved problems: making machines learn them, adopt them and retain them. AI can be created to have virtually any goal, but almost any sufficiently ambitious goal can lead to subgoals of self-preservation, resource acquisition and curiosity to understand the world better—the former two may potentially lead a superintelligence AI to cause problems for humans, and the latter may prevent it from retaining the goals we give it.” (p. 389).
How to implement Value Alignment? Wilson and Daugherty [
8] describe three critical roles that we, humans, need to perform:
Training: Developing ‘personalities’ for AI requires considerable training by diverse experts. For example, in order to create Cortana’s personality, Microsoft’s AI assistant, several human trainers such as a play writer, novelist and poet, spent hours in helping developers create a personality that is confident, helpful and not too ‘bossy’. Apple’s Siri is another example. Much time and effort was spent to create Siri with a hint of sassiness, as expected from an Apple product.
Creating AI with more complex and subtle human traits is sought after by new startups for AI assistants. Koko, a startup born out of the MIT Media Lab, has created an AI assistant that can display sympathy. For example, if a person is having a bad day, it will not just say ‘I’m sorry to hear that’, but will ask for more information and perhaps provide advice like ‘tension could be harnessed into action and change’ [
86].
Explaining: As AI develops, it reaches results through processes that are unclear to users at times, a sort of internal ‘black box’. Therefore, they require expert, industry specific ‘explainers’ for us to understand how AI reached a certain conclusion. This is especially critical in evidence-based industries such as medicine and law. A medical practitioner must receive an explanation of why an AI assistant gave a certain recommendation, what is the internal ‘reasoning’ that led to a decision. In a similar way, law enforcement investigating an autonomous vehicle accident, need experts to explain the AI’s reasoning behind decisions that led to an accident [
8].
Sustaining: AI also requires sustainers. Sustainers oversee and work on making sure AI is functioning as intended, in a safe and responsible manner. For example, a sustainer would make sure an autonomous car recognizes all human diversity and takes action not to risk or harm any human being. Other sustainers may be in charge of making sure AI is functioning within the desired ethical norms. For example, when analyzing big data to enhance user monetization, a sustainer would oversee that the process is using general statistical data and not specific and personal data (which may generate negative sentiment by users) to deduce its conclusions and actions [
8].
The unique roles of human values presented here, have been linked to the workplace environment, but they are undoubtedly relevant to all spheres of life. As we claimed earlier, we are at the beginning of a new developmental stage in AI, the one of artificial social intelligence. Within this realm, new questions concerning human values may arise.