1. Introduction
Amartya Sen [
1] argues that famine and other catastrophes are easily avoided in a democracy. This argument relies on the fact that where information can freely diffuse, decision makers can form an unbiased picture of the state of a society and take proper measures. Biases due to individual opinions are expected to be washed out in the information aggregation process, a phenomenon often referred to as the “wisdom of crowds” [
2]. Still, cases of information aggregation failure abound even in democratic societies (while writing, Turkey is in a state of turmoil exacerbated by the fact that the democratically elected government failed to properly understand and respond to the issues that were raised [
3].). For example, in the aftermath of the 2008 Lehman Brothers bankruptcy, former Federal Reserve chairman, Alan Greenspan, expressed his state of “shocked disbelief” during his hearing before the US Committee of Government Oversight and Reform, leaving the public opinion to wonder: where did he get the information the policy of the FED was based on? Al Gore [
4] argues that shortcuts between decision makers and the media are often such that the former are not in the best position to be informed about what is going on.
A number of models on social dynamics have addressed the issue of information aggregation (see, e.g., [
5] for an excellent review). The simplest is probably the voter model, which entails agents taking the same opinion of a randomly chosen node amongst their neighbors. This allows for sharp predictions [
6,
7], where, generally, the information aggregation process converges to the incorrect outcome with finite probability. Other contributions have, instead, proposed different opinion dynamics mechanisms, such as majority rules [
8] or social impact models [
9], which support different conclusions. These models, however, come short in their micro-economic foundation, as the interaction mechanism is somewhat arbitrary.
More detailed micro-economic models of social learning have been proposed in economics literature. It is well known that information aggregation may fail when agents free ride on the information gathered by others, without seeking independent sources. This phenomenon, called rational herding [
10], is also supported by experimental evidence [
11].
A sequel of papers have focused on Bayesian learning schemes (At odds with models of rational herding, where agents deduce signals from the behavior of others, in Bayesian learning schemes, agents exchange the full probability distribution on the signals they have received. This avoids the loss of information, which lies at the heart of rational herding. This phenomenon in the context of a social network is discussed, e.g., in [
12]) [
13,
14,
15,
16], coming to the generic conclusion that when agents update their beliefs following Bayes’ rule, society correctly aggregates information (still, see [
17,
18,
19]). Some authors have focused on the impact of dominant groups of individuals on the aggregation of information. For example, Bala and Goyal [
20] introduced the notion of “Royal Family” as a group of agents whose behavior is observed by anyone else. Alternatively, Golub and Jackson [
21] defined
t-step “prominent groups” as those groups whose behavior eventually influences all other agents within time,
t. Regardless of the specific definitions, these and other studies unanimously highlighted the negative role that exceedingly influential groups have on the information aggregation process.
In this paper, we focus on an extremely stylized model of a society, and we address the issue of whether information distributed across the population is able to diffuse to an uniformed, well-connected clique of decision makers. Our model assumes Bayesian learning, but differently from [
13,
14], who study a continuum of agents, we study a finite, but large population of agents connected by a social network. On a finite network, when agents talk repeatedly with their peers, they may not be able to disentangle what in their peer’s opinion is new information and what reflects information exchanged in previous interactions, including the one provided by themselves to them. This phenomenon, called “persuasion bias” in [
22], introduces a non-trivial positive feedback and leads to information aggregation failure, at odds with the conclusions of [
13,
14].
The main conclusions of our paper can be summarized in two points: (i) Information aggregation crucially depends on the synchronicity of the information updates of different agents: in the extreme case of a parallel update dynamics, where we can derive analytical results, information diffusion leads to the correct outcome in the limit of a very large society or for very informative initial signals. When the fraction of agents who update their beliefs at each time step is lower than a critical threshold, the society converges with finite probability to the wrong outcome, no matter how large the society is; (ii) In the case of parallel dynamics, information aggregation degrades as the size of the clique of uninformed agents gets smaller. In particular, the limit of a vanishingly small clique of uninformed agents behaves markedly differently from the case of a homogeneous society (with no clique). Both results suggest that it might not be wise to rely on crowds in situations that are reminiscent of those prevailing in our societies, where update is sequential and the social network is characterized by highly connected cliques (news corporations, political parties).
The paper is organized as follows. In
Section 2, we detail the network structure and the information update rules, as well as our quantitative measure of a social network’s ability to correctly aggregate information. In
Section 3, we provide analytic results for the case of parallel information update. In
Section 4, we numerically compare these results with those obtained in sequential update schemes. We conclude the paper with a few conclusive remarks in
Section 5.
3. Parallel Dynamics
According to the parallel dynamics prescription, all agents in a social network listen to their neighbors at any time,
, and update their state of knowledge accordingly:
By collecting all
s into a column vector,
(here, we switch to bracket notation,
i.e. we denote by
the column vector with components
, and by
the corresponding row vector), the dynamics described in Equation (
8) can be rewritten as:
The above equation clearly suggests that the spectral properties of the adjacency matrix
A play a crucial role in the time evolution of the state of knowledge vector
. Being symmetric, the adjacency matrix
A yields
N real eigenvalues,
, whose corresponding eigenvectors
(
) form an orthogonal set in
. By decomposing the adjacency matrix as
, one can see that, for large enough times, Equation (
9) becomes:
As is well known from the Frobenius-Perron theorem [
24], all components of the eigenvector
, corresponding to the largest eigenvalue of the adjacency matrix
A, share the same sign, which we shall assume to be positive from now on. Thus, in light of the relation in Equation (
10), two main points become apparent:
In the following, we shall compute the probability Equation (
11) for the simple network topology discussed above.
For the sake of simplicity, let us assume
, so that the probability in Equation (
11) is equivalent to the probability of the scalar product
being positive, and that each agent is initially given one signal,
, at time
. Assuming that hubs,
i.e. nodes in the clique
, have no initial information (
for
), such a scalar product can be written as a sum over the
sites not belonging to
:
where
denotes the
i-th component of the first eigenvector and
(see Equation (
5)). A good approximation scheme to estimate the probability of the quantity in Equation (
12) being positive is via the central limit theorem: as a matter of fact, the scalar product in Equation (
12) is the sum of
random variables, each given by the product of two random variables:
. Thus, the probability of
Y in Equation (
12) being positive is approximately given by:
where
and
denote the mean and standard deviation, respectively, of the random variable
Y. Given the independence of the
s and the eigenvector components
s, such two quantities are given by:
where
and
denote the mean and standard deviation of the random variables
, whereas
and
denote the mean and standard deviation of the eigenvector components
for
. Computing
and
is easy. Recalling that signals must be informative (see Equation (
1)), one has
. Let us rewrite such probability as
with
. Then, one can immediately verify that:
As regards
and
, good approximate expressions for them can be computed by employing standard perturbation theory up to the second order (see
Appendix A for the details). With the leading order in
N, one gets:
where
denotes the fraction of hubs in the network. As can be seen from the inset in
Figure 1, the above approximations are in excellent agreement with results obtained from numerical diagonalization of adjacency matrices, especially for large network sizes.
Figure 1.
The prediction by Equation (
18) for the probability of correct information aggregation (solid line) is compared with the results of numerical simulations run with the parallel dynamics of Equation (
9). Each dot represents the empirical estimate of such a probability for a given value of
x computed as the fraction (over
samples) of networks that reached consensus on the true value of
X. All simulations were performed on networks with
and
for different values of the system size
N (shown in the plot). Inset: Comparison between the large
N approximations (solid lines) for
and
in Equation (19) and the corresponding quantities estimated by averaging over the top eigenvectors
of 100 network configurations. As can be seen, for large enough values of
N the empirically measured mean and standard deviation are in excellent agreement with the approximations in Equation (
16). In all cases, we have
and
.
Figure 1.
The prediction by Equation (
18) for the probability of correct information aggregation (solid line) is compared with the results of numerical simulations run with the parallel dynamics of Equation (
9). Each dot represents the empirical estimate of such a probability for a given value of
x computed as the fraction (over
samples) of networks that reached consensus on the true value of
X. All simulations were performed on networks with
and
for different values of the system size
N (shown in the plot). Inset: Comparison between the large
N approximations (solid lines) for
and
in Equation (19) and the corresponding quantities estimated by averaging over the top eigenvectors
of 100 network configurations. As can be seen, for large enough values of
N the empirically measured mean and standard deviation are in excellent agreement with the approximations in Equation (
16). In all cases, we have
and
.
Plugging Equations (
15) and (
16) into Equation (
14), one can eventually compute the probability of converging to the right value of event
X, as in Equation (
13):
As already stated, we are mostly interested in cases where only a few nodes in the network play the role of hubs,
i.e.
: in this case, the probability in Equation (
17) further simplifies to the following remarkably simple expression:
In
Figure 1, the prediction by the above equation is compared with the results of numerical simulations for
and
, and for several different system sizes
N: all results are in very good agreement with Equation (
18) (all data points are rescaled in order to collapse on the function
).
A few comments are in order on the approximate result of Equation (
18). Since
, according to Equation (
18) for each system size
N correct information aggregation happens with a probability that, for all practical purposes, can be considered equal to one when the initial signals’ informativeness is
, where
. This point essentially means that for
any population size
N correct information aggregation is possible, for informative enough initial signals, despite the presence of a fraction
f of dominant nodes. Such a result shows that the presence of a group of individuals with large influence does not necessarily jeopardize correct information aggregation. Moreover, the threshold value
is inversely proportional to
, meaning that large populations will be able to aggregate information correctly as soon as signals are informative,
i.e. as soon as
p is slightly larger than
. This is essentially a stronger statement of previous results obtained for infinite networks (see, for example, [
17]), where the presence of signals with arbitrarily large informativeness, combined with the lack of individuals with unbounded influence, is identified as a sufficient condition for correct information aggregation. On the other hand, for
the population reaches consensus on the wrong value of
X with non-zero probability.
A very interesting role in the information aggregation process is played by the fraction of hubs
f. In
Figure 2, one can see how, for a fixed system size
N, the probability of correct information aggregation behaves when increasing the fraction
f of hubs in the network. Furthermore, it is rather interesting to compare such results with the information aggregation capabilities of a regular graph, where all nodes have the same degree
c. In such a case, one can immediately verify that the first eigenvector of the adjacency matrix is uniform with all components equal to
, and the probability of the scalar product in Equation (
12) being positive simply reduces to the probability of the sum
being positive. Therefore, one can compute the probability of correct information aggregation of a regular network with easy central limit theorem considerations, analogous to those already presented in this Section. Such a probability does not depend on
c and reads:
where the last approximation holds for large values of
N. As one can see, Equation (
18) reduces to the above expression for
(though, numerically, one does not find perfect agreement between the two, since Equations (
17) and (
18) represent good approximations only for very low values of
f). Therefore, the lesson to be learned from the plots in
Figure 2 is two-fold. First, one can see that as soon as a very small clique of uninformed hubs is introduced, in a regular graph, the overall population’s ability to correctly aggregate information decreases sharply. This can also be understood by observing that the probability in Equation (
17) does not recover the regular network (RN) result, Equation (
19), when considering vanishingly small fractions of hubs,
i.e.
On the other hand, whenever a clique of hubs is present in the network, then information aggregation can actually be improved by increasing the size of the clique itself, up to the point (for
) where the aggregation ability of the original regular graph can almost be reproduced.
Figure 2.
Probability of correct information aggregation as a function of the informativeness level of initial signals. The solid line refers to the case of a regular graph with
(see Equation (
19)). The other data points refer to networks with
and
with different fractions of hubs
f. As can be seen, “perturbing” a regular graph with the introduction of a very small fraction of hubs seriously reduces the network’s information aggregation performance. Increasing the fraction of hubs up to
progressively restores the regular graph levels of aggregation. Each data point was obtained by averaging over
independent networks.
Figure 2.
Probability of correct information aggregation as a function of the informativeness level of initial signals. The solid line refers to the case of a regular graph with
(see Equation (
19)). The other data points refer to networks with
and
with different fractions of hubs
f. As can be seen, “perturbing” a regular graph with the introduction of a very small fraction of hubs seriously reduces the network’s information aggregation performance. Increasing the fraction of hubs up to
progressively restores the regular graph levels of aggregation. Each data point was obtained by averaging over
independent networks.
Intuitively, the above findings can be altogether understood in the following terms. According to our setting, all hubs in the clique are mutually connected and have a degree equal to . This means that each hub has exactly c neighbors outside , so that one can expect roughly nodes to fall within the clique’s neighborhood, . Therefore, for very low values of f, contains a negligibly small number of nodes, which, however, will largely influence the initially uninformed hubs whenever they communicate for the first time. Given the small size of , its initial state of knowledge will be much more sensitive to fluctuations in the initial signal’s distribution among agents. On the other hand, when the number of nodes in the neighborhood of becomes of order N, hence much more robust with respect to fluctuations.
In summary, the role of hubs in our model is subtle, as a handful of them is enough to heavily damage the good information aggregation properties of a population of equals (as modeled by a regular graph), whereas increasing their number also has “healing” effects, which can restore such good properties.
4. General Dynamics
So far, we only have considered the most popular and widely used evolution rule for the information propagation on a network,
i.e. the parallel dynamics introduced in Equation (
8). However, as already discussed in
Section 2, parallel dynamics represents one of the two extreme cases of the general dynamics in Equation (
6), according to which a fraction Φ of agents listens to their neighbors at each time step
t,
i.e. the case
. The other extreme case is the already mentioned RNS dynamics (
), according to which agents update their state of knowledge one at a time.
Numerical simulations highlight significant differences in a social network’s ability to aggregate information correctly under parallel or RNS dynamics, the latter performing much worse than the former: as shown in the left panel of
Figure 3, the probability of correct information aggregation under parallel dynamics outperforms the one obtained under RNS dynamics over a wide range of signal informativeness levels (we also performed numerical simulations under a random link sequential (RLS) dynamics,
i.e. an information update rule according to which at each time step
t a link is randomly selected, and the two nodes that share it exchange information. However, none of the simulations we performed highlighted any significant difference between such a dynamics and the parallel one). Moreover, results obtained via RNS dynamics show no relevant dependence on the system size
N.
The above findings suggest to look for a transition in information aggregation as a function of the number of agents that update their state of knowledge at a given time step by letting the parameter Φ take values over the whole interval
. In the right panel of
Figure 3, we plot the probability
P of correct information aggregation as a function of Φ for different system sizes and a fixed informativeness level of the signals initially distributed to agents (the qualitative overall appearance of the results is not changed when considering different levels of informativeness). As can be seen, for increasing values of Φ a transition is observed towards better information aggregation capabilities for all system sizes. This can essentially be interpreted in terms of the speed of information update. As one could expect, RNS dynamics is extremely slow compared to parallel dynamics (depending on the system size, we find, on average, that RNS dynamics reaches consensus in times that are 3–4 orders of magnitude larger than the ones required by parallel update), hence more prone to allow the spreading of misleading signals in the agents’ initial distribution. On the other hand, parallel dynamics is fast, in such a way that, in a few time steps, each agent receives through his/her neighbors aggregated information coming from the whole network.
Figure 3.
Left: Probability of correct information aggregation as a function of the informativeness level of initial signals. The different data sets refer to different types of dynamics run on networks with or , and . As can be seen, RNS dynamics does not show any significant dependence on the system size and performs much worse than parallel dynamics at correctly aggregating information. Right: Probability of correct information aggregation as a function of the fraction Φ of agents that listen to their neighbors at each time step. The extreme cases, and , correspond, respectively, to RNS and parallel dynamics. All data were obtained for signal informativeness fixed as . In both plots, all data points are obtained by averaging over independent network configurations.
Figure 3.
Left: Probability of correct information aggregation as a function of the informativeness level of initial signals. The different data sets refer to different types of dynamics run on networks with or , and . As can be seen, RNS dynamics does not show any significant dependence on the system size and performs much worse than parallel dynamics at correctly aggregating information. Right: Probability of correct information aggregation as a function of the fraction Φ of agents that listen to their neighbors at each time step. The extreme cases, and , correspond, respectively, to RNS and parallel dynamics. All data were obtained for signal informativeness fixed as . In both plots, all data points are obtained by averaging over independent network configurations.