1. Introduction
The analogy between brain and computer has always been a driving force in the field of Artificial Intelligence (AI), as a source of inspiration for the design of any algorithm underlying intelligent systems on the one hand, and as a justification for why some algorithmic technologies were able to achieve similar (or superior) performance to human ones in solving certain tasks on the other.
Both of these accounts of the brain–computer analogy, however, seem to be called into question by recent advances of modern AI based on deep learning techniques. On the one hand, it has emerged that the adoption of algorithmic strategies other than those presumed to be adopted in a biological brain has led to wider success, with results that significantly exceed human performance; on the other hand, the analogy between modern artificial neural models and biological ones appears increasingly inconsistent [
1], in the physical structure, in the type of abstraction operated on the input data and in the implemented algorithmic solutions, so much so as to argue that the two cognitive models may be
incommensurable [
2]. The term
incommensurable is probably a problematic term in this context, as it can be interpreted with multiple meanings, but we will cover the subject in more detail in
Section 3.
The weakening of the analogy between the brain and the computer, which could be considered a value in itself in the design of the algorithmic aspects of neural networks, changes things. With the abandonment of the analogy with the brain at all costs in the design of the algorithms underlying a cognitive architecture, we have returned to an opportunistic attitude, whereby the effectiveness of a cognitive model is measured again only based on its performance: if it fulfills the task for which it was designed, then it is a good application model, otherwise not.
Regarding the second aspect, the need for a plausible explanation of the efficiency of the method seems to be a purely theoretical instance, but there are also more practical implications. For several applications there is a requirement that is known as “Explainable AI” (XAI) [
3,
4,
5], which implies that the results of the solution can be understood by humans. This need for explanation has prompted research on automatic methods that provide interpretations of individual predictions made by DL models. For example,
Local Interpretable Model-agnostic Explanations (LIME) [
6] is a simplified approximation of the original DL model, which is interpretable;
SHapley Additive exPlanation (SHAP) [
7] assigns scores to the original model features that mostly contribute to an individual prediction. Even if these methods provide interesting insights on DL model behavior, they are exposed to the very same problem: how to ensure the faithfulness of the automatic generated explanations for a prediction made by inherently opaque models. Despite their opacity [
8], the analogy with the functioning of a biological brain allowed the algorithms’ ancestors of modern deep learning systems to be more acceptable, and not fully understanding them could be justified by our limited knowledge of how a biological brain works [
1].
Technically speaking, the crux of the XAI problem in deep learning lies in the fact that artificial neural networks do not provide programmers with clear representations of their algorithmic functioning [
9,
10,
11]. Deep neural networks lack the interpretability of the model, so when we consider why a computer makes a particular decision or prediction instead of another, we remain ignorant about the reason for the decision. This lack of knowledge about decisions made by an intelligent system has important implications, ranging from the practical to the ethical to the legal.
In this work, we address the issue of providing some explanations of the reason for the failure of the analogy between biological cognitive models and the algorithmic logics adopted by modern deep learning systems, embracing the emerging thesis of
incommensurability of the two cognitive models proposed by the scholar Beatrice Fazi [
2], which is in contrast to the position of many scholars. Then, we also try to provide a justification for the success of modern artificial cognitive systems despite their persistent opacity, highlighting some aspects that make them closer to the biological cognitive model, more than what can emerge from recent studies.
Specifically, we discuss the fact that modeling a successful cognitive system, biological or artificial, can be the result of an opportunistic approach to design. After all, even nature, in the implementation of the notion of adaptation, works in an opportunistic way: if a biological model works better than another, it has more chances of surviving over time, regardless of the elegance of the solution adopted for adaptation.
A cognitive system that self-adapts works opportunistically by providing solutions that are not (necessarily) elegant and do not need to be explained by logical rules. Evolution, through natural selection, therefore produces cognitive models that work, but which nevertheless can remain opaque to any attempt at explainability.
2. Deep Learning and the Explainability Problem
The most striking change in AI in recent years is due to the family of algorithmic techniques used in modern deep learning systems [
12]. These techniques represent the latest evolution of artificial neural networks organized in layers [
13]. Deep learning is responsible for the current resurgence of AI [
14] after several decades of slow and unsatisfactory progress. Thanks to efficient algorithmic techniques, together with their enormous computing capabilities, modern computers seem able to act without being explicitly programmed, modifying their underlying mathematical functions and defining their behavior on the basis of the data provided in input and producing decisions or forecasts.
The levels of efficiency achieved by intelligent systems based on deep learning were completely unexpected, given that they are based on algorithmic techniques that derive from artificial neural networks, an approach to AI that was stagnant at the beginning of this century. After decades of intense development, artificial neural networks seemed to have exhausted their potential. However, applying minor tweaks to the initial design led to a resurrection so impressive that it remained largely inexplicable.
Deep learning actually works well from an engineering point of view; so good that it is difficult to understand why [
9]. The decision-making of quasi-autonomous artificial agents powered by deep learning affects millions of people every day. The range of decisions covered by deep learning is vast. Thus there are certainly sociological and marketing factors contributing to the current fortunate period of deep learning, but this is obviously not sufficient to justify the effectiveness of such algorithmic techniques. For this reason, in recent years, there has been growing attention on finding valid explanations for this success. The main two explanations generally point to hardware performance and neuroscience [
9], but scholars have long agreed that both of these explanations are not adoptable. First, there is a lot of skepticism about attributing the success of deep learning to computing power alone.
The second justification, that traces the effectiveness of such systems to their connection with neuroscience, has been very common in the past and is often stated as an undisputed fact. More recently, however, the role of neuroscience in explaining the efficacy of deep learning appears to have failed in its goal [
1].
Technically, this aspect is referred to as the opacity of an algorithm, a widespread problem in modern computer science that can be associated with many widely used algorithms, such as social classification and positioning mechanisms, spam filters, credit card fraud detection, results listings in a search engine, market segmentation and advertising, just to name a few, and it is no coincidence that these classification mechanisms are in many cases based on deep learning and machine learning algorithms.
The scholar Jenna Burrel, in her work [
8], distinguishes three forms of opacity: opacity as a business secret or intentional state, opacity as technical illiteracy and an opacity that derives from a lack of explainability of the algorithm. The latter is the one that interests us most in this work, a form of opacity that derives, for example, from the mismatch between the characteristic mathematical optimization of machine learning and the needs of reasoning on a human scale and of semantic interpretation styles. It is a form of opacity that is often confused with the latter as part of the general sense that algorithms and code are very technical, complex and difficult to understand [
8].
Explainability is a key word for present and future algorithmic cultures, raising equally unique social and ethical challenges.
However, it is interesting to note that what makes deep learning techniques powerful often also makes their theoretical support opaque. In this context, deep learning algorithms can be traced back to black boxes [
15], tools that appear to work very well but for which it is not as clear how and why they work. The knowledge generated and encapsulated in these models remains, in part, implicit due to the very nature of these tools: the non-linear structure of the hidden mathematical functions, the abstraction of the data they produce, the distributed character of the network representations, and the presence of numerous configurations of its large sets of variables.
Thus, strange as it may seem, there is still no formal explanation for why deep learning math is so effective. Perhaps, the domain that appears most appropriate for a justification of the functioning of deep learning is that of mathematics [
1]. In a recent paper, Berner et al. [
16] describe the new field of mathematical analysis of deep learning, giving partial answers to some relevant questions concerning such techniques: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way.
3. Incommensurable Cognitive Models
The brain provides animals with a behavioral flexibility that is unmatched in our most sophisticated digital computers. Biological brains are constantly adapting to the uncertain, noisy and rapidly changing world they find themselves immersed in. It is therefore not surprising that the idea of drawing inspiration from biological neural architectures both for the design of microprocessors—the so-called
brain in silicon [
17]—and for the design of artificial intelligent systems, has been a driving force since the birth of the first computers.
Regarding the design of the neuromorphic hardware, Plebe and Grasso [
17] identify three main periods during which this idea was transformed into practical projects. The first neural hardware was designed by Marvin Minsky in 1951 [
18], based on the logical interpretation of the neural activity of McCulloch and Pitts [
19], and was followed by a few other attempts. After a long period of an almost total lack of progress, a renewed interest sparked in the late 1980s, driven by the success of neural algorithms in software [
13], with several funded projects in Europe, the United States and Japan [
20]. However, by the beginning of this century, almost none of the results of all that effort had reached maturity. In recent years, a new wave of enthusiasm for neural hardware has spread, driven by some large projects funded in Europe and the United States for realistic brain simulations [
21]. In this case, a revolution in the design of microprocessors was also foreseen, in terms that closely resemble those of the previous two periods. A revolution that has never happened. Although we are not currently able to predict the future of neuromorphic hardware, the principles used to promote this venture appear theoretically wrong, and therefore the premises for the success of this approach are weak [
17].
On the other hand, the discourse relating to algorithmic aspects of neural networks is different, even if the conclusions seem to go in the same direction.
Initially, the similarity with the functioning of a biological brain was considered a useful cue for the efficacy of the purpose: if the analogy worked then it was considered a good thing, otherwise different ways of dealing with the problem were investigated. However, this way of thinking had become less popular over the past two or three decades. More recently, however, with the success of deep neural networks, things seem to have changed again and the initial opportunistic attitude seems to be back in vogue again. The resonance of the successes of deep learning has sparked intense reflections and discussions within the cognitive sciences and philosophy [
9,
22,
23,
24,
25,
26] and the analogy between the biological brain and the artificial cognitive models has been even more heavily questioned, mainly because of the persistent opacity of such models [
8,
15].
Yet deep learning is by no means a perspective that conflicts with the idea of designing a certain cognitive architecture in a biologically inspired way. On the contrary, we argue it ultimately derives from the central role that connectionism has given to learning and this, in turn, is an idea that is based on the way the brain’s learning works and that comes from the biological environment.
However, recent achievements in deep learning have provided some surprising counter-proofs of the biological-brain analogy, in which the adoption of strategies other than those adopted by the brain is successful.
Perconti and Plebe [
1] suggest two possible justifications for the XAI problem. The first justification is linked to the opacity of the biological cognitive model and therefore to the fact that our knowledge of how the brain deals with certain problems is incomplete. From this follows our inability to translate this solution into the correct algorithmic approach for an artificial agent. An alternative justification is that the algorithmic solutions adopted by a biological brain are not the most effective to be implemented in an artificial cognitive model.
It should be added that there are neural models that attempt to adhere much more rigorously than DL to brain mechanisms and processes that have been functionally identified and are empirically justified. One of the best example is SPAUN [
27], based on the Neural Engineering Framework [
28], the functional and anatomical architecture of which broadly matches the organization of the cortex and of several subcortical nuclei. Models of this sort are excellent for the exploration of cognitive abilities, but they are not at all competitive with DL in application fields.
Here we argue a third justification that embraces the recent hypothesis of the incommensurability of the two cognitive models advanced by the researcher Beatrice Fazi [
2]. Specifically, we discuss how the mechanisms put in place by biological evolution, linked to the specific constraints of an organic system that imposes more tortuous paths, have led to solutions that are not comparable to the solutions taken in the modeling of digital cognition. However, it is important to underline that the solutions adopted by biological systems cannot be defined as better or worse than the solutions implemented in an artificial cognitive model: they are simply incommensurable.
The key idea is that recognizing the incommensurability between the way nature and machines construct cognitive models implies recognizing the ontological and epistemological disparity between the way humans and computational agents make decisions. Thus, inevitably, this discrepancy is reflected in the way such decisions might be told or represented by humans and artificial algorithmic agents, respectively.
In defense of the this thesis, we highlight some of the profound differences between the two cognitive models by evaluating, in the following sections, the physical structure of the models, the differences in the learning paradigm and the differences in the abstraction model.
3.1. Structural Issues
It is self-evident that, when comparing brains and computers, the first difference that stands out is at a structural level, although some similarities can be observed.
We can observe that both biological brain and a computer base their functioning on electricity, but this analogy is ephemeral. The electrical energy for digital computing is in fact conducted by metals, such as copper, the fastest conductor available, and semiconductors, such as silicon and germanium, which allow the flow of electrons to be controlled at the maximum possible speed. The biological model has instead faced a huge disadvantage in dealing with electricity, compared to human-made devices, as metals and semiconductors cannot be used inside living organisms. Given the abundant availability of sodium in the marine environment during the Paleozoic, nature opted for the use of ions, the only electrical conductors compatible with organic materials available at that time. The biophysical breakthrough in the exploitation of electricity in animals was in fact the ion channel, a sort of natural electrical device, which first appeared as a potassium channel about three billion years ago in bacteria, then evolved into the sodium channel 650 million years ago, which is currently the most important neural channel [
29].
Of course, evolution has had its problems in allowing biological cognitive models to deal with electricity, but it has also had much more imagination in designing such models. In fact, neurons are not all the same. There are hundreds of different types of neurons that have been identified in the mammalian brain. Neurons can range in size from few micrometers up to several centimeters in length [
30]. The number of inputs to a cell can range from around 500 or less (in the retina) to well over 200,000 (Purkinje cells in the cerebellum). Even in the new era of three-dimensional semiconductors, which could adapt to the density of a massive connection pool in the nervous system, technology would not be sufficient as a basis for engineering the layered structure of a biological brain.
It has been a traditionally held view that the ways of exploiting electricity in the brain and in the computer lead to two different kind of computation: analogical and digital. This is undoubtedly an important differentiation, but its actual scope has been widely reviewed in the light of advances in philosophy of computation. There is currently a convergence on a broader notion of computation, which Piccinini and Bahar [
31] call
generic computation that can encompass brains, computers, and possibly other physical devices. It is defined as the processing of vehicles—entities or variables that can change state according to rules that are sensitive to certain properties of the vehicle and, in particular, to then differences between different portions of the vehicles [
32]. Assuming that brains process medium-independent vehicles, it follows that both computers and brains are generic computation devices.
3.2. The Learning Paradigm
Learning is the behavioral modification that follows, or is induced by, an interaction with the environment and is the result of experiences that lead to the establishment of new response configurations to external stimuli. It is one of the fundamental cognitive phenomena for evolution and involves many animal species in addition to man; the development and survival of individuals is based on their ability to learn.
The learning paradigm is also one of the most successful loans from the mind and brain to the world of digital computing, a loan that has accompanied the design of artificial intelligent systems since their inception. Alan Turing [
33] himself, with extraordinary foresight, imagined the creation of artificial brains which, instead of being rigidly programmed in their tasks, could learn them on their own through a series of experiences. Inspired by neuroscience, he also imagined the presence of distributed interconnected elements and the possibility of organizing the system by strengthening or cutting connections based on experience.
From a simple idea, the learning paradigm became a reality about 30 years after the appearance of the first digital computers with the invention of the first artificial neural networks. Today, the “deep” version of those early neural networks is responsible for the astonishing resurgence of artificial intelligence.
One might therefore think that the analogy with the learning of a biological brain has paid off. Once again, however, this analogy seems illuminating, but only at an appropriate distance.
The success of the artificial neural network [
13] has in fact nothing to do with neuroscience, and instead derives from the gradient methods, developed in mathematics for the minimization of the continuously differentiable function [
34]. The resulting learning algorithm is very efficient and is called the backpropagation algorithm. This method, conceived by Geoffrey Hinton [
35], the father of deep learning, however drastically differs from biological learning.
If
is the vector of all the learnable parameters in a network, and
is the measure of the error of the network with parameters
when applied to sample
, the back-propagation updates the parameters iteratively, according to the following formula:
where
t spans all available samples
, and
is the learning rate.
Today, deep neural networks can still learn with algorithms quite close to the good old backpropagation, even if this term has fallen into disuse, replaced by Stochastic Gradient Descent (SGD) [
36]. Its basic formulation is as follows:
and the similarity with the standard backpropagation equation is immediately noticed.
Instead of calculating the gradients on a single sample
t, in Equation (
1) a stochastic estimate is made on a random subset of dimension
M of the entire dataset and at each iteration step
t a different subset is sampled, with the same size.
There have been attempts to more closely reproduce the learning mechanisms put in place by the biological brain, but every time we try to copy biological learning techniques in more detail, the advantages have disappeared [
1]. Hinton himself, despite his fundamental contribution in the invention of the back-propagation algorithm, worked on an alternative neural model called the Boltzmann Machine, which used a more plausible learning method [
37]. However, the performances obtained by these models have never been up to those obtained with the learning of networks with back propagation. Even today, Hinton continues to test whether there are variations in the SGD in the direction of some sort of biological plausibility, but the results on large-scale benchmarks are well below those of the SGD [
38].
Despite wide efforts, the effectiveness of SGD in deep learning remains elusive. There are, indeed, theoretical results on the convergence of SGD [
39]. Let
be the best possible vector of parameters, selected by those with a norm less than a scalar
,
:
where
N is the number of available samples, in the general case
is unknown. Further let
, and assume Equation (
2) is run for
T iterations with learning rate
. Then:
Therefore, it is possible to achieve an arbitrary convergence with a large enough number T of iterations. This result, however, has been demonstrated only for the case where the loss function is convex in , and this assumption is rarely met in deep learning models.
Furthermore, the sharp advantage of deep over shallow networks remains puzzling, given that a higher number of layers is likely to lead to complex, non-convex loss functions. Partial answers to this questions are coming from topological analysis. Bianchini and Scarselli [
40,
41] used Betti numbers to investigate the topological properties of the space of functions generated by neural networks. The term “Betti number” was introduced by Henri Poincaré after the work of Enrico [
42], and, informally, refers to the number of holes on a topological surface, in a given dimension. Bianchini and Scarselli found different asymptotic expressions for the Betti numbers of the topology generated by all functions in neural networks for shallow and deep networks. The analysis is limited to networks with a single binary output, and the topology is investigated for the set of all
for which the output of the network is positive, as is typical in a binary classifier. Calling such a set
for shallow networks, and
for deep ones, with
n the overall number of units, the sum of Betti numbers
is ruled by the following equations:
where
D is the dimension of the input vector. Equation (
5) states that for shallow networks Betti numbers grows at most polynomially with respect to the number of the hidden units
n, while from Equation (
6) it turns out that for deep architectures Betti numbers can grow exponentially in the number of the hidden units. Therefore, by increasing the number of hidden units, more complex functions can be generated when the architectures are deep.
Betti numbers based complexity has been later extended to more general neural models, as classifiers into
c classes [
43]. For a model
the sum of Betti numbers can have the following complexity:
where
n is the total number of the hidden units as in the previous equations, and
h is the number of hidden layers. Equation (
7) indicates that, for multi-class neural models too, the Betti numbers based complexity grows with the increasing depth
h. In [
44] there is the demonstration that a simple family of functions on
is expressible by a feedforward neural network with two hidden layers and not by a network with one hidden layer, unless its width will grow with
. The same group later extended [
45] the results to hyperspherical and hyperellittical functions. In the end, whatever the essential reason why SGD works so well for deep networks, these studies clearly show that this reason lies in mathematical intricacies, which have very little to do with the functioning of brain neurons.
3.3. Information Abstraction
In analyzing the analogy between the biological brain and its digital counterpart, it is therefore also necessary to pay attention to how a human being (or an animal) receives and processes stimuli and information, and how it can be said that a digital machine can do the same. There is also a significant difference between the abstraction choices adopted by man and those adopted by a computer. As we will see also in this respect, the two models appear immeasurable because they cannot be measured against each other or compared with a common standard.
This difference not only concerns the different capacity at which the two models have to receive input data, but also concerns a specifically computational relationship with the intelligible, that is, with what is apprehensible only through forms of abstract activity.
Using the metaphor of vision, since much of the most successful scientific research on deep learning has focused on computer vision, we can say that artificial cognitive agents see and process inputs differently. Indeed, relatively simple and immediate human intuitions on how to identify shapes, structures and geometric primitives are not easily expressed in computational terms.
Here we return to an example reported by Beatrice Fazi [
2] relating to the automatic recognition of human handwriting, a task notoriously difficult to perform computationally and for which the more traditional programming techniques do not work well but where, however, deep learning algorithms succeed very well. Suppose we want a program to recognize a handwritten digit, such as zero. In the case of supervised learning, thousands of scans of handwritten zeros are sent to the machine as training examples. The program then learns to recognize the digit, not as a human might do (e.g., by determining that a zero resembles a vertical oval), but by mechanically detecting complex patterns of darker and lighter pixels expressed as arrays of numbers.
This is certainly a different form of perception or reception of the input data. However, beyond the physical reception of data, we are also seeing a specific form of abstractive ability, similar to an automated mode of conceptualization, that is, an automated mode of forming internal representations intended to generalize while abstracting from observed facts.
For example, returning to the case of algorithmic recognition of handwritten zeros, the deep learning model identifies and constructs more relevant representations than any human programmer could have identified and given to the machine. In fact, these are representations that a human would not have (and could not have) abstracted in the first place. The way in which the program extracts and organizes information in terms of characteristics and then generalizes this information to form the desired “concept”—or, in computational terms, the representation of the desired output of zero—is therefore entirely and exclusively computational.
4. A Role for Evolution, and an Artificial Metaphor for It
From what has been discussed in the previous sections, it is evident that the attempts to compare biological brain models and modern artificial cognitive models have failed except in specific sectors such as models related to computer vision.
This certainly had repercussions on the explainability of the functioning of artificial cognitive models. At present, progress in understanding some computational activities is achieved by trial and error, and operations are often rationalized retrospectively. To put it differently, “many algorithms used in artificial neural networks are understood only at the heuristic level, where scientists know empirically that certain training protocols employing large datasets will result in excellent performance” [
46].
However, one must always keep in mind that neural networks share a general cognitive principle that is fundamental to biological neural circuits as well. They are both formidable devices for learning sophisticated functions based on experience. Organisms whose brain circuits have the highest capacity to learn, such as man, owe it to natural evolution. The ability of neural models to learn, despite the efficiency of deep learning, does not come for free, nor is there a natural evolution that helps. However, we argue that the patient and painstaking effort of deep neural model designers can be framed as a metaphor for natural evolution.
The basic idea of biological evolution is that populations and species of organisms change over time. The British naturalist Charles Darwin, in his most influential and controversial book entitled “The Origin of Species” [
47], was the first to propose the idea that living species can evolve and to suggest a mechanism for evolution: natural selection and genetic drift, that is, the change in the frequency of an existing genetic variant. Based on this mechanism, hereditary traits that help organisms survive and reproduce become more common in a population over time.
Such mechanisms of evolution work with the random variation generated by the mutation of the genetic code of organisms in the passage from one generation to the next. Although environmental factors are thought to influence the mutation rate, scholars agree that they are unable to influence the direction of the mutation. Hence, for example, exposure to harmful chemicals can increase the mutation rate, but it will not be able to stimulate those mutations that make the body resistant to those chemicals. For this reason the mutations are random, that is, the fact that specific mutations occur or not is not related to the usefulness of the mutation itself.
Here, we argue that biological and artificial cognitive models have a lot in common regarding the process of evolution whereby such models are modified until they converge to a successful version in solving certain problems. In this context, it is necessary to take into account the right differences in carrying out this analogy. The metaphor of evolution should be considered not in the strict sense but serves to help us understand how certain dynamics can coincide, as well as the results obtained in the process of building a cognitive model.
For instance, one of the interesting thing is that evolution, in its general sense, does not necessarily converge to the same solution nor the optimal solution but, based on the conditions and contexts in which it operates, it can produce significantly different solutions to the same problem. Such solutions, although different, are shaped by the same forces and are produced by the same laws and, at times, may have nothing in common and can hardly be compared, therefore they are incommensurable.
4.1. Variable Tuning and Natural Selection
The cognitive model of a living species, such as man, is defined at birth and is written in its genetic code, the DNA, inherited from the parents. We are talking about a complex and particularly large piece of code. The human genome, for instance, is made of 3.2 billion bases but other organisms have different genome sizes, and even much simpler living beings are characterized by a very complex genetic code: the Sars-Cov-2 virus, for example, with over 29 million bases is one of the RNA viruses with the longest and most complex genetic code.
We refer to it as a piece of code since it contains instructions needed to make the proteins and molecules essential for the growth, development and health of the organism. However, it also contains the parameters of the biological system (which are the indicators of the features of the living organism) including the brain, indicating which cells it will contain, the number of each cell, the size of those cells or how they will interact, just to enumerate a few of these parameters.
However, if we think of the various changes that occurred in the brain from the first mammals to the appearance of homo sapiens, we can see how this model has evolved into the very complex version that we (partially) know today. This took place not only through changes that made it possible to converge to the current version of the model, but also through a series of trials and errors implemented by the evolutionary process and by natural selection. This is a phenomenon known as genetic drift in which some genes may occasionally change randomly as they pass from one generation to the next. If these mutations lead to changes in the model that make it more environmentally friendly then they are more likely to consolidate in subsequent generations. Evolution then proceeds by applying random modifications to the biological parameters encoded in the genome and natural selection chooses which of the proposed mutations is the most suitable for survival in the environment.
Similarly to what happens in the evolution of a biological cognitive model, we can say that the modeling of an artificial cognitive system follows a similar path. Neural network modeling can often involve tuning many different hyperparameters, which are typically external to the model, which means that they cannot be learned and modified by the network itself through data processing. They are referred to as optimization parameters because there is no analytical formula available to compute their appropriate values [
48].
For this reason, such values must often be specified by the programmer. Their definition is based, in most cases, on experience or through the setting of some initial values and, subsequently, proceeding by trials and errors to modify them until the identification of a satisfactory solution for a given problem.
Often one of the most painful aspects of Deep Neural Networks modeling and training is the large number of hyperparameters you are dealing with. These could be, for instance, the learning rate
, the discount factor
and the epsilon
if the RMSprop optimizer is used [
49,
50], or the exponential decay rates
and
if the Adam optimizer [
51] is used. It is also necessary to define the number of levels in the network or the number of hidden units for each level. In addition it may be possible to use learning rate planners. However, those just exposed are just some of the hyperparameters that may need to be configured when modeling a neural network.
Usually, it can help to classify hyperparameters into two groups: hyperparameters used for training and those used for model design. A correct choice of hyperparameters related to model training would allow neural networks to learn faster and achieve improved performance making the optimization process definitely something you would like to worry about. Hyperparameters for model design are more related to the structure of neural networks. A trivial example is the number of hidden layers and the width of these layers.
A really useful summary of the order of importance of hyperparameters was compiled by Tong Yu and Hong Zhu in their paper [
52].
Thus, the evolution of a cognitive model can be seen (in all contexts) as a tuning problem of the model variables. It is therefore an engineering problem and not an algorithmic design problem.
However, unlike what happens in the design of algorithms operating in intelligent artificial systems, biological cognitive models are not driven by a programmer or any higher entity. There is no grand plan or design of nature. Evolution has no purpose and there are a lot of mutations in genes that are neutral and have absolutely no positive or negative effect on survival, and they just happen to be passed on and it is easy to understand how these neutral aspects are the ones that most resist explainability, both in a biological system and in an artificial one.
In fact, it is a very common misconception about natural selection, that over time, evolution selects the characteristics of an organism that are most perfectly suited to its environment, just as a programmer selects the values of the hyperparameters that allow for better performance of the neural network. The misunderstanding may be partly due to the term
natural selection itself which conjures up parallels with, for example, a dog breeder “selecting” the desirable traits in their animals. Indeed, nature is not actually “selecting” anything: natural selection is a process, not a conscious force. For this reason, there are good reasons why the natural selection process may not always lead to a “perfect” solution. First, selection can only act on the available genetic variation. A cheetah, for example, cannot evolve to run faster if a genetic variant is not available that allows for greater speed. Secondly, the biological cognitive model is limited in working with the tools that nature has made available to it. This means that an artificial cognitive system is able to converge in a more targeted and faster way to a version capable of obtaining good results in solving certain tasks. However, there are already some first attempts to leave the evolution of artificial models to chance by implementing neural networks that automatically modify their hyperparameters assisted by genetic algorithms [
53,
54].
Thus, another aspect that disagrees with this analogy could be the fact that, in an artificial system, evolution is a relatively fast process, unlike what happens in the evolution of biological models which require unfathomable times. However, it might come as a surprise to know that biological evolution can also be very fast. We can also see it happen in real time, if we’re lucky (or unlucky).
Let us take some very small life forms like bacteria, for example, which are actively becoming resistant to antibiotics. Some have become resistant to so many drugs that they are rapidly becoming nearly impossible to treat. Mutations in the genes of a virus responsible for a pandemic, such as that of COVID-19, can occur within a few weeks, giving rise to several new variants. There is a similar situation even in the not so small life forms. Just to cite an example, we spend a lot of money on pesticides and herbicides to eradicate pests and weeds, but both have created all sorts of adaptations over the past half century to escape death and continue to thrive.
4.2. A World of Different Cognitive Models
The idea of the existence of very different (or even incommensurable) cognitive models is not something that only shapes the comparison between the biological and the artificial brains. The world we live in could already provide us with notable examples of significantly different cognitive models, each with its own peculiar features, each of which offers different solutions to the same problems, sometimes impenetrable to explainability.
Charles Darwin’s idea that man is essentially a “big-brain ape” and the basic mammalian brain uniformity theory was abandoned in the 1990s when the turn to microscopic study of the human brain [
55] highlighted significant differences in the very structure of the various cognitive models. It was discovered, for example, that in one layer of man’s primary visual cortex, nerve cells are organized in a complex mesh pattern that is very different even from that of primates, our closest relatives.
Obviously, the fact that man is an animal implies that his brain has a lot in common with that of many other animals, for example the components that allow us to manage movement, thought and the perception of the world around us. However, every animal’s brain does something a little different and special. For example, cats have very good eyesight and have more brain capacity for their sense of sight. Likewise, dogs have a very good sense of smell, so the part of their brain that can identify different smells is very powerful compared to other animals. However, these examples concern senses that are familiar to us even if they are considerably more sensitive. Thus it is possible to give a sort of explanation of the cognitive abilities of these animals, even if they are hardly imaginable for us.
Taking a step further towards our thesis, we can observe that the cognitive models developed by other animals make it possible to process information that humans are unable to perceive. Bats, for example, are able to map their surroundings through sound, thanks to a sonar system used also in other animals such as dolphins and whales. It has been discovered that the sounds produced by these species allow them to form a mental map of their surroundings in three dimensions. More recently it has been discovered that this ability is even more incredible: the sound waves emitted can penetrate through certain objects and soft tissues, providing the animal with a kind of X-ray vision. Even electricity, invisible to us, represents for some animal species a signal that guides them towards their goal. Sharks and other fish can detect it thanks to a series of channels filled with a gelatinous material, which open outward through the pores of the skin.
In these specific cases it is much more difficult for us to give an understandable interpretation of how these species manage to solve certain problems thanks to their perceptual models. There is a scientific interpretation but explaining it to those unfamiliar with the language of science is particularly difficult.
In nature there are cognitive models that are even less penetrable to explainability. For example, it has recently been discovered that some insects, and also other animals such as octopus, thanks to the dorsal edge of their eyes, are able to distinguish polarized light, that is, light that has a specific vector orientation in space. This cognitive ability appears to be also used by ants and bees to travel great distances to return to their burrows.
Finally there are some cases that still resist explainability. We cite the tendency of certain animals to align their bodies with the north–south axis of the earth’s magnetic field during movement. Scientists have been observing this phenomenon for a decade, but have found no explanation for this ability, which, according to what has been confirmed, is present in many creatures, from bacteria to vertebrates.
Scientists argue that different species use different systems to do so. For example, there is a type of pigment called cryptochrome that is present in the eye of some species and is activated with blue light by a magnetism-sensitive quantum mechanism. It is thought that this would allow these animals to see the Earth’s magnetic field in the form of a blue trail.
5. Discussion
In order for the algorithmic aspects of a cognitive model, biological or artificial, to be effectively explained and therefore expressed and shared, there must be a common experience between the communicator and the recipient of the communication. It is clear that this is not possible in the case of interactions between man and a computer, since there is no common phenomenological or existential ground between human abstractions and those of a computational agent.
The same can be said in the case of interactions between animal species that perceive the environment differently. Although scientists are able to give an explanation of why certain cognitive and perceptual models can work, it is not possible to allow an experiential sharing of the algorithmic aspects that characterize the cognitive model. Physics tells us how bats see the world around us, but we are unable to imagine or share that kind of perception.
We must therefore be careful when dealing with, on the one hand, how a human being receives and processes stimuli or information, and how, on the other, it can be said that an artificial system (or another animal species) does the same. It is therefore important to speak of incommensurability between abstract choices made by different cognitive models.
The confirmation of what has been said can be found in the analysis of artificial cognitive systems capable of self-training without the use of training datasets provided in input. Let us take the example of AlphaGo Zero, a version of DeepMind’s AlphaGo software developed in 2017 [
56] and created without using training data. AlphaGo Zero was stronger than any previous version and reached the level of AlphaGo Master in 21 days. The interesting thing is that this software was successful not because it behaved like a human player, but because it played differently to a human. This condition is particularly interesting from both a philosophical and a socio-cultural point of view. The cognitive model has autonomously developed its own completely personal sensitivity in playing games, far exceeding human performance in this task.
If we accept the fact that artificial cognitive models are incommensurable compared to the human cognitive model and, consequently, that they are able to develop independent solutions significantly divergent from those adopted by a biological brain, we could be able to exploit them as a source of inspiration to guide human investigation.
For example, in her recent work “In Defense of the Black Box”, scholar Elizabeth Holm reports that in an innovative medical imaging study, scientists trained a deep learning system to diagnose diabetic retinopathy, achieving performance that has outdated a committee of ophthalmological experts, exploiting a variety of factors including cardiological risk, age and sex. Interestingly, no one had ever noticed gender differences in human retinas before, so the results from the artificial cognitive model inspired researchers to investigate how and why male and female retinas differ.
Alternative cognitive models available in nature (implemented in different animal species) have often inspired our technology and science in general. For example, the dual frequency band biosonar of bats inspired scientists to develop a new processing technique to improve geophysical observations based on the interpretation of two-dimensional radar images of the subsoil [
57].
It follows therefore that incommensurable cognitive models, precisely because they are such, can contribute substantially and productively to science, technology, engineering and mathematics to “provide value, optimize results and stimulate inspiration” [
15].