1. Introduction
Recent studies suggest that humans seem to track their conspecifics’ mental states, such as their goals and beliefs, from early infancy (e.g., [
1,
2]). In contrast to a standard (or elicited-response) false belief task (FBT)
1, these studies employed spontaneous-response FBT—also known as non-verbal or implicit tasks—that no longer require an explicit answer to a question about the protagonist’s belief (e.g., [
2]).
Whether or not these abilities involve representing beliefs has been intensely debated. The mind-reading accounts claim that infants’ social cognitive abilities manifest a capacity to attribute beliefs [
4,
5,
6]. This capacity is of the adult kind, and researchers have appealed to performance factors to explain infants’ failures in elicited-response FBT. The non-mind-reading accounts contend that mentalist understanding is not necessary to explain infants’ behavior in spontaneous-response FBT. Some claim that children’s responses might be based on behavior rules relating specific situations to specific actions without taking mental processes into consideration [
7,
8]. Others postulate that infants create three-way agent-object-location associations [
8,
9]. Other researchers argue that infants’ looking behavior is a function of the degree to which the observed and remembered or expected colors, shapes, and movements (the low-level properties of the test stimuli) are novel with respect to events encoded by infants earlier in the experiment [
10]. Here I do not wish to discuss which explanation is best. Instead, I am interested in the mind-reading account and the possibility of explaining infant performance in terms of deflationary mind-reading.
The proponents of early mind-reading onset argue that spontaneous-response FBT findings indicate that false belief understanding arrives much earlier than four years old, the age suggested in several developmental studies on mind-reading (e.g., [
3,
11]). In particular, some defenders of (a deflationary version of) the mind-reading account maintain that infants do not need to represent beliefs to perform in spontaneous-response FBT, instead they represent belief-like states, i.e., registrations [
12]. This proposal is articulated as the need of two different systems to explain results in spontaneous-response and explicit-response FBT. A flexible system that enables full-blown mind-reading and operates on propositional attitudes explains the results of explicit-response FBT, and an efficient system that enables early mind-reading and operates on registers or belief-like states explains the results of spontaneous-response FBT. At the same time, this account seeks to be an intermediate explanation of early social cognitive abilities, “between mindless behaviorism and full-blown propositional attitude psychology” [
12].
However, the underlying cognitive architecture that enables early mind-reading remains unclear. Apperly et al.’s account suffers from an incomplete understanding of the vehicle of registrations. This obscures the nature of the representations each system operates with. Although they set out conditions sufficient to distinguish registrations from beliefs, they are best met by understanding registrations as map-like representations. The aim of this paper is to contribute to solving the problem of the nature of the representations the efficient mind-reading system operates with. In doing this, it offers a novel solution for an important question: how to explain the early social abilities pre-verbal children exhibit. Since there is strong agreement that a certain mastery of language is needed to represent propositional attitudes (e.g., [
13,
14]), it seems implausible that pre-verbal children represent propositional attitudes. Thus, what are they representing instead, in early social abilities? Although it is not new that the structure of mental representations can be map-like rather than language-like, as far as I know, it has not been suggested that (deflationary) mind-reading can be guided by map-like representations. Thus, this proposal is novel and it addresses the important question of the kind of representation, different from propositional attitude representation, pre-verbal children exploit in early social abilities.
In order to accomplish this, I will first present the problem of the status of registrations. I will try to show that the characteristics Apperly and colleagues set out for belief-like states or registrations do not succeed in making a distinction between the kinds of representations the flexible and efficient system relies on (
Section 2). However, I will not discard them and I will suggest these conditions are best met by understanding registrations as cartographic representations. This characterization provides a better understanding of the kind of representations the efficient system operates with than the Apperly and colleagues’ one. Moreover, it allows for distinguishing among registrations, beliefs, and perceptions (
Section 3). Thirdly, I offer some reasons to support this proposal related to the performance of spontaneous-response FBT. I also suggest that map-like vehicles can represent what is needed to perform in these tasks (i.e., the agent, the object, and the spatial relation between them) (
Section 4).
2. The Characterization of the Efficient System’s Representations as Belief-Like States
In this Section, I present the problem of the status of belief-like states or registrations. Since the flexible and efficient systems operate on different representations (i.e., propositional attitudes vs. belief-like states), a clear distinction between the representations each system operates on is needed. The lack of distinction is problematic for the two-system proposal. In
Section 3, I offer a solution to this problem.
Recent studies investigating infants’ social cognitive abilities, that is, their capacity to make predictions about others’ behavior, have found that pre-verbal children are sensitive to others’ beliefs, including false beliefs. These spontaneous-response FBT findings indicate that false belief understanding arrives much earlier than 4-years-old, the age suggested in several studies with elicited-response or standard FBT (e.g., [
3,
11]).
Usually, the spontaneous-response FBT consists of a violation of expectation study contrasting looking-times to scenarios that are either consistent or inconsistent with an agent’s belief. In [
2], 15-month-old infants witness scenarios of an agent forming either a true belief about an object location, or a false belief (in this case, the location of the object is changed while the agent’s sight is occluded). The agent searches in one of the two locations, one that is compatible with the belief or another that is not. Infants look longer when the agent searches for the location that is incompatible with the belief. The longer looking is interpreted as the expectations of infants that agents behave in consonance with their beliefs.
According to the two-system approach to mind-reading, the spontaneous-response FBT findings are explained by a cognitively efficient, but inflexible, system, and the elicited-response FBT findings are explained by a flexible, but cognitively demanding, system [
12]. Thus, this architecture involves two systems for tracking others’ mental states. These systems constitute two types of processes to the extent that they have complementary trade-offs between efficiency and flexibility [
12]. The efficient system (ES) is evolutionary and ontogenetically ancient, fast-operating, largely automatic, and independent of central cognitive processes. The flexible system (FS) is late-developing, slow-operating, makes demands on central executive resources, and the advances in executive functions and language help to cultivate flexible attributions about others’ psychological perspectives [
15]. Particularly,
“If there is an efficient system for ascribing belief-like states, then the efficiency of this system cannot come for free,
so there will be restrictions on the kind of input it can take and the kinds of belief-like states it can ascribe. This contrasts starkly with a primary feature of late-developing belief reasoning competence, which is the ability to use all cognitively-available facts to ascribe any belief that the subject can, themselves, entertain. These considerations mean that the early- and late-developing systems
cannot be fully continuous with each other” ([
12] p. 964, emphasis mine).
Conforming to this, the ES differs from the FS in terms of the input (i.e., the information that it takes into account to generate the output) and the output (i.e., the kind of mental state it ascribes to other people). It is also claimed that the fact that belief-reasoning is global contrasts with the kind of reasoning the ES allows, which is local. Reasoning is local when the amount of information taken into account to generate ascriptions of belief-like states is less than the expectations and beliefs the organism has [
16]. The aforementioned features associated with two-system theories sufficiently characterize the distinction between processes. However, in terms of the representations each process operates on, there remains some confusion that I will try to show in this section.
Apperly and colleagues claim that efficient and flexible mind-reading are different in respect to the “type of model of mental representation” each system relies on ([
17] p. 185). Systems are not continuous and each relies on different representations. The FS operates on beliefs and other propositional attitudes, and the ES operates on representations that differ from beliefs and other propositional attitudes in terms of “signature limits” [
12]. These distinctive limitations refer to what can be processed. However, I will try to show that the characteristics that differentiate belief-like states from beliefs and other propositional attitudes are not sufficient to make a distinction between the representations each system relies on. This is problematic for the two-system proposal.
According to Apperly and colleagues, an early understanding of another agent’s behavior does not require tracking her mental states as propositional attitudes; instead young infants track belief-like states rather than full-fledged beliefs.
2 Belief-like states are characterized as relational attitudes whose contents can be discriminated in a crude way, using relations between objects and locations or other properties, but not involving relations to propositions [
12,
17]. In particular, registrations are described as belief-like states. Registrations consist in a relation between an agent, an object and a location: the agent
registers an object in a location [
12].
Registrations resemble beliefs in some respects. Firstly, they continue to obtain after an object is no longer in one’s field of vision. So, unlike perceptions, registrations function to keep an individual’s actions in connection with relevant information even after such information is no longer available [
12]. Secondly, registrations can guide actions in a limited, but useful, range of situations. Registrations are connected to actions in a way that one can understand a registration as an enabling condition for an action. Registering an object and its location enables an agent to act on it later provided that its location does not change. Moreover, registrations enable actions to be predicted [
12]. In this sense, registrations serve as proxies of beliefs both true and false: ascriptions of beliefs and belief-like states lead to identical expectations about an agent’s behavior in a limited range of situations. Thirdly, registrations also resemble beliefs in having correctness conditions that may not be obtained: a registration fails to be correct when the object registered is not where it is registered as being [
12].
At the same time, unlike belief, “registrations must be relations to objects and properties, not to propositions. They must have their effects on action by setting parameters for action independently of each other and independent of any psychological state; or, if they interact, they must do so in ways that are codifiable (unlike belief, whose interactions with desires, intentions, and other beliefs are as complex as interactions among reasons). Reasoning about registrations imposes signature limits. It does not permit tracking beliefs that involve quantifiers (no absences, then) or indefinitely complex combinations of properties (perhaps large melons and yellow melons, but probably not large yellow melons), nor does reasoning about registrations allow for a distinction between what is represented and how it is represented (sometimes refer as mode of presentation or sense)” ([
12], p. 963)
Infants’ belief reasoning is limited and its restrictions are manifested in terms of what Apperly and colleagues called “signature limits”.
3 These distinctive limits determine what is efficiently processed, and four features have been offered to differentiate registrations from beliefs and other propositional attitudes (i.e., the “model of the mental” that the FS operates on). Registrations have no propositional content, their functional role is less promiscuous than belief’s functional role, their content is not as fine-grained as belief content, and they do not represent objects with aspectuality. However, I will try to show that these features are not sufficient to characterize belief-like states in a way that would differentiate them from beliefs, nor differentiate the representations each system operates on.
According to the quotation from Apperly and Butterfill above, registrations cannot represent indefinitely complex combinations of properties while beliefs can. Thus, in relation to their contents, beliefs allow for making finer-grained distinctions than registrations. However, registrations have perceptual content and it has been claimed in the conceptualism/non-conceptualism debate that perceptual content is more fine-grained than beliefs’ content [
18,
19]. A way to present the debate is in the following terms: are perceptual experience states different from cognitive states? Perceptual experiences refer to conscious perceptual states that the subject is aware of, whose contents are related to the subject’s perceptual system. Cognition is understood as states and processes related to thinking, such as reasoning, belief formation, etc. Non-conceptualists contend that these states differ in the kind of content they have. Cognitive states have conceptual content, while perceptual experiences have non-conceptual content (i.e., a radically different kind of representational content) (e.g., [
19]). Conceptualists claim that if their contents were different, it would not be possible to use the content of perceptual experience in cognitive tasks, such as belief formation and justification. Due to this, conceptualists claim that both kinds of states have the same kind of content, i.e., conceptual content (e.g., [
18]).
Among the main objections against conceptualism is the claim that it is not possible to represent the fineness of grain and informational richness of perceptual experience by appealing to the conceptual background of the perceiving subject. Perceptual beliefs, or the concepts in terms of which we register our perceptual experiences, are coarser grained than perceptual content since we do not have concepts for everything we perceive (e.g., we can perceptually discriminate varied shades of red while only having the general concept RED). Since registrations have perceptual content, it is possible to say that their content is finer-grained than a belief’s content. However, this contradicts the Apperly and colleagues’ claim that a belief’s content is finer-grained than a registration’s content. To the extent that their content is perceptual, it is possible to say that registrations have finer-grained content than beliefs. However, registrations cannot have content that is both finer- and-coarser grained than a beliefs’ content. Thus, the nature of registrations is confusing. The question about the grain of the content of registrations needs to be addressed. I will offer a solution in
Section 3.
Apperly and colleagues also claim that reasoning about registrations does not allow for a distinction between “what is represented and how is represented” (above quotation [
12] p. 963). Belief-reasoning researchers refer to the fact that beliefs represent a given object under some guises but not others as “mode of presentation or sense” (above quotation), and they call this feature the “aspectuality” of beliefs (e.g., [
17]).
4 This characteristic of beliefs and propositional attitudes, in general, is also known as “referential opacity”. Propositional attitudes determine opaque contexts in the following sense: a proposition in a sentence is held to be referentially opaque if terms or phrases that refer to the same object cannot be freely substituted without altering the truth value of the sentence that expresses the proposition. Take the case of “Lois Lane believes that Superman is fearless”; while this is probably true, it is false that “Lois Lane believes that Clark Kent is fearless” even if Superman and Clark Kent, unbeknownst to Lois Lane, refer to the same individual. In this sense, as Lois Lane represents the individual under the guise of “Superman”, she believes that he is fearless. However, if she represents him as Clark Kent, she does not believe that. According to Apperly and colleagues, one of the representational limitations of the ES is that it cannot encode the aspectuality of beliefs (and propositional attitudes).
However, the mere possibility of being opaque cannot make belief attributions cognitively demanding. Propositional attitudes can be also transparent in the following sense: a position in a sentence held to be referentially transparent if terms or phrases in that position that refer to the same object can be freely substituted without altering the truth value of the sentence. Contrarily to the above case “Jonathan Kent believes that Superman is fearless” and “Jonathan Kent believes that Clark Kent is fearless” can be freely substituted since Jonathan Kent knows that Superman and Clark Kent refer to the same individual. It may well be that opaque beliefs are plausibly cognitively demanding to be processed. However, it remains possible that representation and manipulation of transparent propositional attitudes do not require expensive cognitive resources. Thus, even if having or not having propositional content can make a distinction between what is a propositional attitude and what is not, it does not succeed in making a distinction between what can be processed efficiently and what cannot be, since it is plausible that transparent beliefs (or propositional attitudes) can be processed without heavy cognitive cost, i.e., efficiently. Additionally, this distinction is relevant for the present purpose of characterizing the representation an efficient system operates on, and distinguishing it from, more cognitively demanding representations.
Researchers contend that ascribing belief and having visual experiences have something in common. In particular, as beliefs can represent the same object under distinct guises, different visual experiences may represent the same thing in different ways. Thus, as the ES is ill-equipped to appreciate the “aspectual nature of mental states generally” ([
17], p. 3), it also fails to appreciate the “aspectuality” of perception. In line with this, they contend that there is another signature limit of ES associated with perspective-taking studies. Developmental researchers have suggested a distinction between Level-1 and Level-2 perspective-taking [
20,
21,
22,
23]. Level-1 perspective-taking requires appreciating that an agent may not see an object that you see. Level-2 perspective-taking requires appreciating that an object simultaneously visible to the self and the other agent may give rise to different visual impressions. For example, the agent sees the object as a duck while I see it as a rabbit. There is agreement that children can pass Level-1 tasks prior to the age of four [
20,
24] and they consistently fail Level-2 tasks before the age of four (e.g., [
20,
23]). “Accordingly, registrations would support Level-1 perspective taking (e.g., appreciating that an agent does not see an object that you see) but not Level- 2 perspective taking (e.g., appreciating whether an agent sees an object as a duck or as a rabbit)” ([
12] p. 963).
At this point, it is important to make some clarifications. “Aspectuality”, in the sense of referential opacity, is a characteristic that only beliefs and propositional attitudes have. Perception or perceptual experience involves understanding another agent’s perspective; for our present purposes, we can call that “perspectivism”. It is important to differentiate aspectuality from perspectivsm. The fact that beliefs represent objects under some descriptions, but not others, leads (aspectuality) to referential opacity that can be easily explicated if beliefs have propositional contents. Thus, if the representations in question (i.e., registrations) do not represent with aspectuality, one can conclude that they do not have propositional contents. However, the impossibility of encoding the aspect an object has from another agent’s perspective cannot allow us to say that registrations have no propositional content.
5 In this sense, the impossibility of encoding different perspectives (ES’ signature limit) does not succeed in making a distinction between belief-like states and propositional attitudes.
Finally, it is claimed that registrations are simpler than beliefs because of their simpler functional role and because “…
belief differs from registration only in that its functional role is more complex and in that specifying its content requires something more sophisticated than a two-place relation.” ([
26] p. 9, emphasis mine). In terms of causal/functional role, both registrations and beliefs are caused by perceptions, guide actions, and generate expectations about others’ behavior. However, registrations are not as inferentially promiscuous as beliefs. They can only be involved in inferences related with perception. The causal role that determines the attitude is simpler. Now, if registrations are distinct from beliefs to the extent that they involve simpler attitudes, it is possible that they constitute simplified beliefs.
6 This is problematic for the defense of a dual-system account of mind-reading. If registrations constitute simplified beliefs, it is not possible to differentiate between the representations each system operates on. As a consequence, both systems will operate on propositional attitudes, though the ES operates on simplified ones, i.e., registrations. This is not what the dual-system proposal postulates.
However, some may still insist that registrations do not have propositional content and that this constitutes a difference. If registrations encode the location of an object according to what an agent witnessed, it is possible that at least their content is perceptual. To the extent that perceptions are not beliefs, could it be the case that they are perceptions? According to the characterization offered by Apperly and colleagues, registrations are like perceptions in some respects. In particular, registrations set the parameters for an action independently from each other and from any other psychological state, or they involve simple interactions that can be represented perceptually (not as complex or arbitrary interactions that are established between beliefs, desires, and other propositional attitudes). Thus, registrations resemble perceptions in that they are sometimes thought to be independent [
12]. In other words, registrations seem to be “local” as perceptions, while belief-reasoning is global. However, unlike perceptions, registrations continue to obtain after an object is no longer in one’s field of sight. Thus, even if their content is perceptual, registrations cannot be considered as perceptions in that they continue to obtain after the object disappears.
Registrations differ from beliefs in some respects that have I tried to show do not succeed in making a distinction between the kinds of representation each system relies on. Moreover, registrations cannot be considered as perceptions in order to distinguish them from beliefs. If we cannot differentiate between beliefs and registrations, it is not possible to distinguish the representations each system operates on. Thus, another criterion is needed to distinguish both kinds of representations.
3. Registrations as Cartographic Representations
According to the two-system proposal, the efficient system operates on representations that are different from beliefs, i.e., they are belief-like states or registrations. Mental representations can be analyzed in two aspects: content and vehicle. The content is what the representation is about, i.e., what is represented. In this sense, it is related with semantic properties of representations. The vehicle is the representational means through which representations represent something, i.e., the format that bears the information in the representation. In this sense, it is related to the representations’ non-semantic properties, usually named “syntactic”. Content and vehicle are two distinct aspects of representations. The same content can have different types of vehicles. For example, the shining sun can be represented by a sentence or by an image. Additionally, different contents can be represented by the same type of vehicle. For example, the shining sun and hot coffee can be represented in a sentence-like manner.
Since registrations do not have propositional content [
12], it is possible that their format is not sentence-like; it could be iconic, for example. I propose that registrations or belief-like states are representational cartographic vehicles, in contrast to beliefs and other propositional attitudes that usually have language-like vehicles [
27,
28,
29,
30].
In sentential representational systems, every sentence is an arrangement of constituents that are, themselves, either primitive or complex. Each complex constituent is an arrangement of primitive constituents. Primitive constituents intrinsically have their own properties. The semantic interpretation of a sentence depends on the way that primitive constituents are structurally related [
31]. A complex arrangement like JOHN LOVES MARY is composed of the primitive constituents JOHN, LOVE, and MARY, maintaining structural or syntactic relations between them. The constituents of language-like representations are those parts of it that are recognized by its “canonical decomposition”. Each primitive constituent always makes approximately the same semantic contribution in the sentence in which it occurs [
28]. The syntactic principle that combines sentential constituents produces vehicles with contents that are indefinitely complex and hierarchically structured. There is agreement that propositional attitudes possess complex causal structures. One of these sources of complexity is the possibility of recursive embedding that leads to hierarchically-structured contents. The other source is the inferential links into which propositional attitude concepts can enter. In line with this, the recursive or hierarchical structure of sentential systems is suitable to represent propositional attitudes.
An uncontroversial point about map-like vehicles is that they have correctness conditions. A map’s geometric structure seeks to replicate salient relations between objects represented in the map [
30,
32,
33]. Thus, a map is correct only if it replicates relevant geometric structure, i.e., it represents salient relations between objects represented by the map, otherwise it is incorrect. Broadly, maps represent geometric structures, i.e., spatial relations between objects and properties. The most common types of structure are metric, as with city or route maps, and topological, as with subway maps [
27,
30].
These two characteristics of map-like vehicles are shared with language-like vehicles. First, both language-like and map-like vehicles have correctness conditions. This characteristic only indicates that both kinds of representations can be considered mental states with content [
34]. In the two-system proposal for mind-reading it is assumed that belief-like states are mental states with content. In any case, mental states with content can still have different representational vehicles. Second, spatial information can also be represented in a sentence. Thus, in principle, this characteristic cannot allow us to differentiate between language-like and map-like vehicles. However, to represent geometric information with a language-like vehicle would require a long sentence enumerating all objects, their locations, and spatial relations between them. Processing this long sentence, i.e., discovering the objects involved and how they are located in relation to one another, involves considerable cognitive cost. Contrarily, in a map-like vehicle the objects, their locations, and spatial relations among them are explicitly represented and are cognitively transparent. This minimizes the need for processing to recover those locations and relations. Consequently, in representing geometric structure holistically, maps are cognitively cheaper than language-like vehicles [
27,
30]. This difference between representational vehicles that can encode spatial relations is relevant for the present purpose of characterizing the representations an efficient system operates on.
At the same time, language-like and map-like vehicles differ in important respects. Typically, maps represent information that is spatially located; for example, some sort of existential information such as “there is a river here”. However, maps are not good for representing information that is not spatially located, such as “somebody, somewhere is a redhead and is dancing beautifully” [
27]. Moreover, some theorists contend that the geometric information represented in cartographic systems is no-logic in the sense that the familiar hallmarks of logical form are absent: sentential logical connectives, quantifiers, predication [
30]. In line with these representational limitations, maps do not seem suitable for representing what is required to represent propositional attitudes: a variety of contents, other than spatial relations and complex hierarchical structures assembled by logical connectors.
However, some theorists contend that map-like vehicles can be modified in order to represent logical connectors. In particular, maps can be extended using the technique of adding higher-order icons [
27]. To represent negative information, for example, “the car is not in the garage”, it is possible to introduce a higher-order icon, like putting a slashed circle over the “car” icon to indicate that the car is not in the represented location. However, higher-order icons could transform maps into messy affairs that demand expensive cognitive resources to represent and manipulate information other than spatial information. Although it is possible to extend maps, that involves cognitive expense. At any rate, I will try to show that the map-like vehicles’ limited expressive powers are not an obstacle for satisfying registrations’ representational requirements. It is also argued that map-like vehicles can be exploited to think when the representational necessities of the thinker are simple [
27]. Thus, if the representational necessities are related to the possibility of exploiting maps in a cognitive capacity, it is plausible to use this kind of representation for another cognitive capacity provided that the representational necessities are simple.
These two last characteristics, the representational limitations and the simple representational necessities, are useful for differentiating language-like vehicles from map-like vehicles. Language-like vehicles are associated with cognitive sophistication, i.e., with the possibility to represent and manipulate contents about abstract, hierarchically-structured states of affairs, what are also known as flexible thoughts, while map-like vehicles are characterized for representing and manipulating limited content.
Initially, it is possible to think that map-like vehicles are appropriate for registrations since a registration is considered to be a relation between an agent, an object, and a location [
12] and this suggests that spatial properties are encoded.
7 In consonance, cartographic systems represent objects and properties related to spatial locations [
27,
30], with cheap cognitive cost as an ES representation requires.
8 Notably, the representational characteristics of map-like vehicles are coincident with the registrations’ ones: to encode geometric information in a cognitively cheap manner. In this sense, map-like vehicles seem to satisfy these representational requirements of the ES for mind-reading. Moreover, map-like vehicles share with registrations a particular representational limitation; they cannot represent quantifiers. Thus, the nature of map-like vehicles and registrations seems to be coincident. The possibility to extend maps in order to augment their expressive powers is not relevant for registrations since they also have similar representational limitations to those of maps. Moreover, I will try to show that the representational power of plain maps is sufficient for satisfying registrations’ representational requirements.
The possibility to exploit maps in a cognitive capacity seems to be related to the representational necessities of the organism [
27]. If those necessities are simple, a cognitive capacity can use map-like vehicles. Thus, if the representational necessities of efficient mind-reading processing prove to be simple, it is plausible that map-like vehicles can be exploited; and I argue that they are simple. Unlike belief, registrations only have perceptual content and their connection with behavior is simple to the extent that efficient mind-reading only needs to represent how what an agent perceives affects its behavior, and that what is represented is related to the actual and recent observation of an agent. Altogether, it is possible to say that the registrations’ representational requirements are simple. Another mark of its simplicity is that the efficient process is triggered by direct cues like an agent’s line of sight, unlike the FS whose deployment does not depend on the immediate availability of cues about what an agent witnesses [
17]. Thus, if an organism employs map-like vehicles in cognitive processing provided that the representational necessities are simple, and efficient mind-reading representational necessities are simple, it is plausible that map-like vehicles are recruited for efficient mind-reading. Underlying this proposal remains the possibility that representational cartographic vehicles can be recruited by different cognitive capacities like, for example, navigation and thinking, just as beliefs can be recruited by diverse cognitive capacities like thinking, decision-making, reasoning, full-blown mind-reading, etc.
In all these respects, map-like vehicles sound appropriate for registrations. However, some might argue that, to the extent that registrations encode perceptual information, their vehicle could be picture-like. Pictorial systems can only represent features that are visually perceptible. A picture-like vehicle represents the visual appearance of the scene replicating the visual appearance itself. In this sense, pictures represent multiple dimensions in an isomorphic way [
27]. This characteristic allows them to communicate a large amount of information simultaneously, but with computationally-expensive costs. To represent something, pictures need to represent all visually-perceptible information with nuanced detail. For example, in a photograph of a scene of a street, the picture tells you, roughly, how many cars there are, if the cars are parked, if there are traffic lights, trees, and benches, their shapes, their colors, their positions, and so on. Contrarily, while cartographic systems are free to represent all visually-perceptive dimensions, they only exploit an isomorphism of spatial dimension. Maps abstract away much of the details and only capture salient features of the represented domain. For example, a city map represents streets and not trees and traffic lights. Moreover, its icons have potentially arbitrary semantics, i.e., the semantic principle that maps icons with objects and properties in the world can be more or less indirect and arbitrary. For example, maps usually represent a church with a cross. In this sense, map-like vehicles are more efficient than picture-like vehicles to represent and manipulate spatial information.
I contend that map-like vehicles are preferable to picture-like vehicles for registrations for the following reasons: firstly, to the extent that map-like vehicles are more efficient than picture-like vehicles to represent and manipulate spatial information; it is more plausible that registrations have map-like vehicles since they are the representations the ES relies on. Secondly, to the extent that picture-like vehicles are associated with perception, and registrations cannot be considered perceptions (i.e., they continue to obtain after the object is out of sight), map-like vehicles seem to fit better for registrations. Thirdly, picture-like vehicles can only represent features that are visually perceptible while maps are free of representing only visually perceptible features. Since they employ discrete icons with potentially conventionalized semantics, map-like vehicles can represent things like hidden treasures with an “X” [
27]. The possibility to represent hidden objects is relevant for performance in spontaneous-response FBT that usually requires keeping track of a hidden object, for example, the watermelon toy hidden in the yellow box. Map-like vehicles satisfy this particular representational requirement pervasive in spontaneous-response FBT, the tasks that the ES (and registrations) seeks to explain.
A map-like vehicle characterization can satisfy the representational requirements for registration suggested by Apperly and colleagues (
Section 2). First, the requirement of not having propositional content is satisfied if registrations have map-like vehicles since this kind of content is usually represented in language-like vehicles. In addition, representations with map-like vehicle are probably not beliefs since map-like vehicles cannot represent the hallmarks of logical form, i.e., what is necessary to represent the contents of beliefs, in particular, their complex and hierarchical combinations of properties. Nonetheless, it has been pointed out that maps can represent logical connectors in extended maps. However, extended maps involve a demanding cognitive cost, and that is not desirable for the representations an efficient system relies on. To get back to the point, the map-like vehicles’ representational possibilities seem to be coincident with what registrations require to represent to the extent that they cannot represent indefinitely-complex combinations of properties.
Second, plain (not extended) maps are comparatively cognitively transparent. They always, and only, employ icons to represent objects and properties as arranged in spatial relations, and they represent that configuration by replicating that very same configuration among the icons, themselves. In this sense, a map-like vehicle has less aspectuality than beliefs and propositional attitudes (associated with language-like vehicles), the purported representations of the FS. They can only represent spatial dimensions and not any kind of description of objects. However, there is room for some aspectuality. Take the case of having a topological map-like and a metric map-like representation of the same spatial arrangement of objects. It is possible that one cannot notice that both maps represent the same spatial arrangement. However, since it is possible to extract a topological map from a metric map, there is a potential connection between both representations. By contrast, there is no connection between the different modes of presentation involved in language-like representations of objects.
Third, it can be said that map-like vehicle registrations have more coarse-grained content than beliefs. In
Section 2, it has been mentioned that the distinction between representations of an efficient system (registrations) and of a flexible system (beliefs) in terms of the level of grain of the content seem to at least be problematic. According to Apperly and colleagues, registrations have more coarse grained content than beliefs. However, registrations have perceptual content and it has been suggested that perceptions have more fine-grained content than beliefs. Thus, either registration has more coarse-grained content than beliefs or more fine-grained content than beliefs, but it cannot have both traits at the same time. Now, if registrations are characterized as representational cartographic vehicles, it is possible to account for the feature of having more coarse-grained content than beliefs in spite of having perceptual content. In the case of cartographic vehicles, it is not necessary to specify the shape, color, relative size, and orientation of the objects; it is sufficient to isomorphically represent the spatial relations between them. Thus, if registrations have map-like vehicles, they have perceptual content, but it is limited; it cannot be equivalent to pictorial vehicles’ content (the vehicle associated with perception), which represents all visual perception dimensions. According to this, a map-like registration’s content is coarser-grained than perception and belief content.
Fourth, a map-like vehicle characterization for registrations can account for the characteristic of having a simple cognitive role. Despite registrations sharing with beliefs the characteristic of being caused by perceptions, guiding actions, and generating expectations about others’ behavior, unlike beliefs, they are only involved in inferences related to perceptions. As a map-like representation, registrations’ functional role will not be as promiscuous as belief’s functional role because the inferences in which registrations are involved will be restricted to perceptual information that might be encoded as spatial relations. In this sense, this characterization allows us not to conflate registrations with simple beliefs (i.e., those beliefs that have simple connections with behavior), in spite of sharing some functional role. According to the above, map-like vehicles are not good for representing beliefs, and if they are extended to augment its expressive powers, they became cognitively costly representations that do not fit with what an efficient system for mind-reading requires. Moreover, this characterization has the advantage of not conflating registrations with perceptions either, to the extent that perceptions are usually associated with picture-like vehicles.
In
Section 2, I try to show that the nature of registrations in the two system proposal is confusing. Apperly and colleagues postulate a new mental entity (i.e., belief-like states or registrations) and characterize it based on its differences from beliefs. However, that characterization suffers from ambiguity. In some aspects registrations are not clearly distinguished from beliefs (functional role, grain of content, aspectuality). Registrations being indistinguishable from beliefs is not adequate for this account since the authors contend that the ES does not operate on propositional attitudes (i.e., beliefs). In order to differentiate them from beliefs, registrations cannot be considered perceptions either (to the extent that they continue to obtain after an object is no longer in one’s field of vision). The fact that registrations have a confusing nature affects the nature of the representation the ES operates on (“the new mental entity”), while the nature of representation that the FS operates on remains intact. Nobody discusses the nature of propositional attitudes. Thus, the nature of the registrations needs to be addressed. The map-like proposal clarifies the nature of the representation the ES operates on. It clearly delimit the kind of representation each system operates on and permits registrations to preserve all the characteristics Apperly and colleagues set out, but without the difficulties suggested in
Section 2.
Moreover, a distinction in terms of vehicles explains efficient processing better than a distinction in terms of non-propositional content proposed by Apperly and colleagues. For example, the map-like characterization of registrations specifies in what sense registrations have perceptually limited content and can be considered representations of an efficient system. Map-like registrations represent perceptual information without specifying the shape, color, relative size, and orientation of the objects, and only isomorphically represent the spatial relations between them. It is plausible to say that efficient processing can operate on this kind of representation. By contrast, to say that content is non-propositional does not specify in what sense it can be efficiently processed. For example, picture-like representations have non-propositional content, but are cognitively costly. Thus, at least initially, there are reasons to say that registrations have map-like vehicles. In the next Section, I will address reasons related with empirical studies.
4. Map-Like Vehicles and Spontaneous-Response False Belief Tasks
There are specific reasons related to the study of early social cognition that may bring support to the map-like vehicle hypothesis. One of the reasons is related to infants and pre-verbal children’s performance in the spontaneous-response FBT. Generally, two principal measures have been used in these tasks: looking behavior and helping. The former refers to the measurement of looking times (e.g., [
2]) or anticipated looking (e.g., [
36]). The latter is considered to be a more active behavior measure and takes into consideration behaviors like pointing or reaching (e.g., [
37]). When analyzed in more detail, spontaneous-response FBT involves variations of a typical change of location (or object transfer) task whatever method is used to measure the spontaneous response. They all recreate the typical mind-reading task that requires infants to track others’ beliefs about an object’s location.
In the seminal task [
2], there is a scene with an agent, two boxes and a flashy toy. During the familiarization trials, infants are familiarized with the agent’s intention to reach the object. During the belief induction trials, the toy is placed in one box and then moved to the other one in two conditions: the agent can see the movement or the agent cannot see the movement because her vision is occluded. In the test condition, the agent reaches the object and it is expected that infants look longer in two situations: when the agent reaches the object in the correct box, despite her vision being occluded and her not being able to see the object’s change of location, and when the agent reaches for the object in the incorrect box despite the fact that she saw the object’s change of location. Results confirm these expectations.
I contend that not only this particular task, but also spontaneous-response FBT, in general, involve agents, objects, and locations, i.e., the sort of information that might be encoded in a map-like representation. To anticipate the agent’s reaching behavior it is enough to understand that the agent has some notion of the location of the object. It is not necessary to understand that the agent has a belief. The mind reader attributes to the agent a correct registration in the condition in which she sees the object’s movement, i.e., she has a correct map of the object location. Based on this registration, it is possible to generate expectations about the agent’s reaching behavior: she will look for the object in the correct location (because she has a correct map). By contrast, if an incorrect registration is attributed because the agent could not see the object’s movement, the mind reader will anticipate that the agent will not reach the object (because she has an incorrect map). In this case, the map is not updated, i.e., the actual relevant relations between objects are not represented. Moreover, it is possible that the sort of map the ES requires may have a topological structure that represents the perceptual space of an agent, since it is not necessary to represent scaled distances, but rather spatial relations like orientation relations (e.g., right/left, above/below, forwards/backwards), or relations between objects (inside/outside, on, over, below, above, opposite, between, and the like).
Some spontaneous-response FBTs cannot be strictly characterized as a change of location task. However, they still involve agents, objects and locations, i.e., what can be encoded in a map-like representation. For example, Kovacs, Teglas and Endress [
38] conducted a series of complex experiments to test seven-month-old early social abilities. The crucial experiment includes an object detection task. Here, the agent enters the scene, placing a ball on a table and then the ball rolls behind an occluder. Then, the ball stays behind the occluder or leaves the scene in both conditions: in the agent’s presence or absence. These conditions are conceived to modulate the agent’s belief about the ball (where does the ball appear?). The agent believes the ball is behind the occluder (condition A), or that there is no ball behind it (condition B). In the test condition, the agent reenters the scene and the occluder is lowered. Infants’ looking times are measured, expecting infants to look longer in two cases: when they expected the agent believed the ball to be behind the occluder but there is no ball, and when they expected the agent believed the ball to not be behind the occluder but there is a ball. Results confirmed these expectations. However, what is important for the purpose of this argument is that the object detection task (where the object appears) is combined with a FBT where conditions A and B seek to modulate the beliefs of the agent (where does the agent think the object appears?). What the agent witnesses can be encoded as the map the agent has about the location of the object. Based on this map-like representation, it is possible to generate the expectation that the agent will be surprised if the object does not appear behind the occluder (or if it appears according to the test condition), just like in the case of a change of location task. Thus, to perform in this task it is enough to encode spatial relations between an agent, an object, and a location.
There are other tasks to test early social abilities that do not involve testing false belief about an object’s location; instead, they test false belief about other properties like, for example, an object’s identity [
39]. However, map-like vehicles might also be appropriate to perform in this kind of task. To form a false belief about object identity consists in forming an erroneous conclusion about which particular object token one is looking at, based on contextual information. This contextual information was once helpful in identifying tokens correctly, but is now outdated and, therefore, misleading [
39]. In the experiment, 18-month-old infants watched an agent interact with two identical toy penguins. One of the penguins could be divided in two pieces (two-piece penguin) and the other could not (one-piece penguin). When assembled, the two-piece penguin was indistinguishable from the one-piece toy. It is assumed that if infants understand that contextual cues could lead the agent to conclude that she is looking at the one-piece penguin when in fact she is looking the united two-piece penguin, they can attribute to the agent a false belief about the identity of the object (some details are omitted in order to go straight to the argument).
In the test trials, the agent has a key to hide inside the two-piece penguin. The false belief condition consists of the agent being absent when the experimenter assembles the two-piece penguin and covered it with a transparent cover, while he covered the one-piece penguin with an opaque cover. Afterwards, the agent entered the scene with her key, and reached for either the transparent or the opaque cover. In the true belief condition, the agent is present when the experimenter assembles the two-piece penguin and covered it with a transparent cover and the one-piece penguin with an opaque one. It was expected that, in the false belief condition, young children would expect the agent to form the belief that the two-piece penguin is under the opaque cover since they faced an (assembled) one-piece penguin under the transparent cover. Thus, they would look longer when the agent reaches the penguin under the transparent cover. Similarly, it is expected that, in the true belief condition, infants would expect the agent to form the belief that the two-piece penguin (where she can hide the key) is under the transparent cover, because she could see when the experimenter put together the divided penguin. Thus, they would look longer when the agent reaches the penguin under the opaque cover.
Again, the penguin task involves a FBT with conditions that seeks to modulate the beliefs of the agent (where does the agent think the divided penguin is?) and can be understood as the “where is the divided penguin located?” task. In other words, it is possible to understand this task as a classic change of location task. If this is the case, what is necessary to perform in this task might be encoded as the map the agent has about the location of the object (the divided penguin). The agent will reach the two-piece penguin if she can register its location, just like in the case of a change of location task. Thus, the penguin task may require encoding spatial relations between an agent, an object, and a location. Consequently, even though the task is conceived to test false beliefs about an object’s properties, it can be considered as a variation of a change of location task; and so infants and pre-verbal children can still use the same map-like resources to perform in it. This task requires representing spatial relations between an agent, an object, and a location despite the sort of inferences about object properties (other than location) the task seeks to test.
I am not suggesting that evidence indicates that infants can only engage in mind-reading for tasks involving spatial relations, or at least spatially-located objects and properties. What I am trying to show is that there are reasons to think that at least young children can engage in mind-reading tasks involving spatially-located objects and properties. Furthermore, that the possibility of mind-reading about properties of objects other than location has been tested based on variations of a “change of location” task.
Finally, the map-vehicle proposal for early mind-reading assumes that infants possess and use cognitive maps. However, is there evidence that young children use map-like representational vehicles? Cognitive maps are related to the study of navigation, i.e., how we move successfully around the world. Researchers studying spatial cognition differentiate between two main ways of functioning in the larger-scale world [
40,
41]. One method involves encoding of self-movements and updating the relations between the self to other objects based on those movements. This method is known as dead-reckoning or inertial navigation. This system is considered to be similar to egocentric responding in developmental literature. The second way is allocentric coding or place learning, a system that involves encoding relations between objects, with some of these objects being hallmarks and organized as a frame of reference. Cognitive maps in animal cognition are associated with this system. The map-like vehicle proposal assumes that young children track the maps other people have about object locations related to them. In this sense, the relevant maps for this proposal are not related to encoding spatial relations regarding the (mind reader’s) self, but to external landmarks (e.g., the other agent, the objects, their spatial relations). Evidence about allocentric coding seems to be relevant for the map-vehicle proposal for early mind-reading.
9It is known that infants younger than one year old show allocentric responses [
42]. Representations using external landmarks are reliably used for action by the second year (24-months-old) [
43]. At 16–36 months, children retrieve objects hidden in a sandbox after walking around to the other side [
44], showing coding relative to landmarks and/or spatial updating with self-motion. Another question related to navigation concerns how children combine different kinds of visual information to maintain their sense of orientation. The environment includes discrete landmarks that could be individuated by color or shape, as well as elements of layout, such as the shape of a room, whose geometric aspect could be coded. Hermer and Spelke [
45] found that 18–24-month-old children, disoriented in sparsely-featured enclosures, reestablish their orientation using geometry (room shape), but not color of the wall. Taken all together, it may be possible that 16-month-olds use landmarks to navigate and that 18-month-olds children encode and use geometric information. All of this is coincident with the use of mental maps (or map-like vehicles) at a young age.
The map-like vehicle proposal not only involves the possibility that infants have and use cognitive maps, but also that they can attribute cognitive maps to other agents. As far as I know, there is no specific evidence regarding this hypothesis. However, there is some evidence that can provide indirectly support this possibility. Perspective taking is considered a typical study of allocentric coding. Perspective taking is the ability to take someone else’s viewpoint into account. For example, when people were asked to give directions to a landmark in New York City, they changed the way they described how to get to the landmark depending on whether the person asking was from the city or not. For city residents, people gave less specific instructions, because they assumed people would know basic aspects of navigating the city, like how to get uptown versus downtown [
46]. In development, a way to test a purely allocentric frame of reference consists in a task that requires to judge what would be the arrangement of objects if the viewer’s viewpoint changed. This task, at the same time, is considered to a perspective problem [
47]. Strikingly, mind-reading has been conceptualized as “cognitive” perspective taking. Thus, evidence from the development of perspective-taking can suggest not only the possibility that infants have and use cognitive maps, but also the possibility that they can attribute cognitive maps to other agents, provided that, as a means to calculate the other agent’s point of view, the child might represent the other agent’s map. Evidence suggests that perspective-taking computations can be produced by children as young as three-years-old [
48,
49]. This evidence is about 36-month-olds; still, it is before four years of age, i.e., when the milestone in development to full-blown mind-reading appears. However, this is very indirect evidence and the possibility that infants attribute maps to other agents is still speculative.