3.1. Resolving Bertrand’s Paradox via the Maximum Entropy Distribution for Chord Lengths
In light of the critique of Bertrand’s solutions, there is a clear way to randomly sample chords that is purely random. The beta-sampling method where
results in a
uniform distribution of chord lengths over the interval. This distribution has maximum entropy for chord lengths. The cumulative distribution is
, and the probability density function is
for
. Thus, the probability that the chord has a length less than the length of an inscribed equilateral triangle is
. Consequently the unique answer to Bertrand’s problem for the probability that a random chord has a length that is greater than the length of a side of the equilateral triangle is
. In general, for a randomly sampled chord that has maximum entropy
If chords are randomly sampled, such that the lengths have a uniform distribution as shown above, then how should the radial distance to the chord midpoints
u and the angular separation
be sampled? In general, the rule for transforming an integration from an
x variate to a
v variate is
where
,
, and where
is the Jacobian of the transformation from the differential of
to the differential
. Thus, for the transformation from the
differential to the
differential, we have
,
, and
. To re-express Equation (9) in terms of the metric of the radial distance to the chord midpoint yields
as
Please note that the integrand of the right-hand-side of Equation (12) is the u-space probability density function that is consistent with the uniform distribution over chord lengths. With a probability density function , the radial method generates chords such that . Thus, if the appropriate density function is used for sampling u values, then the same answer to Bertrand’s problem is found as for the beta-sampling method where , which results in a uniform distribution of chord lengths over the interval.
Let us re-examine the angular-separation method so that it is consistent with the random selection of chords where the density function is
for
. The Jacobian to transform the differential from
to
is
with
and
. Thus, it follows that
is
Simplifying Equation (13) by the linear transformation from
to
where
yields
From Equation (14) it is clear that the probability density in space is . With this density function, the angular-separation method yields the answer to Bertrand’s problem as .
3.2. The Importance of a Dominant Metric Representation
Nonlinear transformations of metrics are at the core of Bertrand’s paradox. In general, in Equation (10) there is a Jacobian for the transformation when the integration over one variate is transformed nonlinearly to an integration over a different variate. It is only for a
linear transformation such as
, which has a Jacobian of
, that the integrand keeps the same functional shape. Thus, a probability density function for one variate will be nonlinearly changed when there is a nonlinear transformation. We have seen that there are three important variates in the analyses in
Section 3.1—the chord length
L, the radial distance to chord midpoint
u, and the angular separation between the endpoints of the chord
. In one of Bertrand’s analyses, he used a uniform distribution over
u to obtain the distribution over chord lengths
L that is not uniform. In fact, there is a strong likelihood of longer chords. Alternatively, he used a uniform distribution over
to obtain a different informative distribution for chord lengths, which also had a preference for longer chords. Had Bertrand used a uniform distribution over
u, and examined the resulting distribution for the angular separation
, then the density function for
would be proportional to
, which is not uniform. Consequently, it is not possible to have a uniform distribution on any of the three variates and to also have a uniform distribution for any of the other two variates. Note this mathematical fact is due to
properties of integration, so it is more general than a special problem with probability. This mathematical fact is also well-known in statistics.
Given the nonlinear relationship between any pair of the three variates, the question arises: why choose any one of the variates as the one for a maximum entropy representation? It is argued here that the problem statement dictates the dominant framework. Bertrand’s problem is about the length of the chords. The problem deals with the comparison between a random chord length and the length of the side of an equilateral triangle. The problem is not about a random angular separation of a chord in comparison to the angle of an equilateral triangle. If the problem were to be changed to be one of comparing random angles, then there would be a different probability answer. Thus, the chord length is the appropriate basis for sampling a random chord due to the problem statement. The problem statement dictates a preferential or dominant variable to use for the maximum entropy distribution. Moreover, the mathematical measure of a chord is its length, so it is not surprising that Bertrand framed the question in terms of the length of a random chord and the length of a side of the inscribed triangle.
3.3. Some Simulation Examples
Further insights about Bertrand’s problem can be obtained from some simulations. The first simulation illustrates the density displays for chord length for three methods. The stochastic processes are (1) uniform sampling of the radial length to the chord midpoint, (2) uniform sampling of the angular separation between the chord endpoints, and (3) uniform sampling of the chord length itself. The code for an R function that implements each of these three methods is provided in
Appendix B.1. For each sampling method, 100,000 samples for chord length were obtained. See
Figure 4 for the resulting probability density estimates on chord length for these three methods. These plots clearly show that the new method advanced in this paper results in a flat distribution over chord lengths whereas the two Bertrand methods do not, as was pointed out earlier in
Section 2.2.
The next set of simulations is designed to illustrate the outcome of drawing random chords for the new method. The initial point
for each chord is sampled from a uniform distribution on
. The corresponding endpoint for each chord is one of the two intersection points between the original circle and the circle that has a center at
and has a radius of
L. The value of
L is randomly sampled from a uniform distribution over
. In general,
and
where
r is the radius of the original circle, and
is a randomly sampled angle for the initial point for the chord. The other endpoint for the chord of length
L is the point
where
, and
. Since
, it follows after some algebra that
The two possible values for the angle
correspond to the two possible intersection points between the two circles as illustrated in
Figure 3. In the simulation, one of these two values is independently selected by a virtual flip of a fair coin. The software for simulating
n random chords with this procedure is provided in
Appendix B.2.
Figure 5 provides the results from six simulations where the circle has radius
, and the chord length
L is uniformly distributed on
. The number of chords for the six panels are:
where
. The plots show that the region near the origin of the disk has relatively fewer chord segments compared to the outer disk region. There are two reasons for this inequality in the density for filling the disk with random chords. First, chords begin and end on the circle boundary, so the outer region must on average have a higher density than the inner region. Second, the inner region can only be traversed if the chord length is large. The method for uniform sampling of length, unlike the other two alternative sampling methods shown in
Figure 4, does not overly sample for longer chords. Yet the issue raised by Bertrand’s problem is not about the density of chord segments in various regions of the disk, so this salient visual pattern is not relevant.
Bertrand’s problem has also been simulated with Poisson-line stochastic processes (e.g., [
20]). For such applications, there is a Poisson random process that generates the number of lines. If the number of lines from the Poisson process yields the value of
n, then
n lines are generated via any particular method for randomly sampling chords. In the software cited above, the radial method is used for producing random chords. However, it should be noted that Bertrand’s problem is not about the number of chords. The number of chords can be simply one. Bertrand’s question remains, regardless of the number of attempts to draw random chords. It is known for the simulations used for
Figure 5 that all the chords were sampled via a stochastic process where the probability that the length of a sampled chord exceeding
is equal to
. We do not need to generate a large sample of chords to ascertain that result. However, if one were to ask a different question about the random chords, then Monte Carlo simulations might be needed. For example, suppose we were interested in the mean and variance of the largest chord length for a set of
n random chords produced by a particular chord generation method. This question is a function of
n, and the rate of convergence to an asymptotic distribution is not generally known. This type of question has attracted some interest (e.g., [
21]), but it is a very different problem from the one raised by Bertrand.
In light of the discussion of the number of chords produced by the uniform length-sampling method, we can pose a different Bertrand-type question. Namely, what is the fewest number of random chords sampled on a unit circle such that the probability is or greater that at least one chord is longer than ? Given that each of the n chords has a probability of for having a length less than , it then follows that the probability that at least one chord exceeds is . The resulting answer to the question is because when the probability is that at least one chord is longer than ; whereas when the probability is . To confirm this answer, a Monte Carlo study of two million trials was examined with and another two million trials with . The proportion of the Monte Carlo samples where the largest chord in the set exceeded was for , but it was for .
3.4. Bertrand’s Problem in a Historical Context
As noted previously, Bertrand had an agenda for his problem. He was a critic of the use of a probability distribution for representing an unknown parameter in a Bayesian analysis when there is an uncountably infinite number of possible outcome states [
3,
4]. Bertrand did not argue directly against the Bayesian approach when there were a finite number of states such as computing the probability as to which of two possible bags of colored marbles was chosen
at random. In this case, the assigning of a prior probability of
to each bag based on the principle of insufficient reason seemed rational. But he felt that the generalization of that principle was problematic for cases, such as the estimation for a biased coin, where there is an uncountable infinite number of possible values for the unknown rate parameter [
3]. As noted by Jaynes [
22],
Since Bertrand proposed it in 1889, this problem has been cited to generations of students to demonstrate that Laplace’s “principle of indifference” contains logical inconsistencies (p. 478).
In the time since Bertrand’s analysis, it is surprising that the simple answer to Bertrand’s problem, which is based on the maximum entropy principle as applied to the mathematical measure of chords, has somehow eluded discovery heretofore. Many scholars instead debated the relative merits of the three solutions provided by Bertrand (e.g., [
2,
7,
15,
22,
23,
24,
25,
26,
27]). Opinions about Bertrand’s three solutions varied among these theorists. Von Mises [
7] agreed with Bertrand, and used the paradox as an argument against the Bayes/Laplace use of probability for a population parameter. Other writers saw the problem as being ambiguous about the stochastic process of selecting chords [
2,
25]. Once a method is selected, then there was a probability that the chord was larger than the side of the inscribed triangle. In essence, these writers agreed with Bertrand without drawing conclusions about the principle of indifference or maximum entropy. Gyenis and Rédei [
27] did not challenge any of Bertrand’s solutions, but they questioned whether Bertrand’s paradox met their standard for a philosophical paradox. As they stated in a philosophical journal,
The interpretation proposed here should make clear that Bertrand’s Paradox cannot be “resolved” — not because it is an unresolvable, genuine paradox but because there is nothing to be resolved: the “paradox” simply states a provable, non-trivial mathematical fact, a fact which is perfectly in line both with the correct intuition about how probability theory should be used to model phenomena and how probability theory is in fact applied in the sciences. (p. 350).
Regardless of the definition of a paradox, the problem that Bertrand identified calls for clarification. What is a random chord, and what is the probability that a random chord of the unit circle has a length greater than ? These are fair questions, and it would be troubling if these questions did not yield a single answer.
Other writers have argued based on symmetry and invariance principles that Bertrand’s radial method is the correct solution [
22,
23,
24,
26,
28]. For example, Jaynes treated the Bertrand problem in the context of another problem in geometric probability in which the plane is superimposed with a random set of lines. If a line does not intersect with the circle, then it does not count. But if a line does intersect with the circle, then the chord is the distance between the two intersection points. Jaynes argued that the distribution of chord lengths should be invariant with the translation of the circle in the plane, i.e., if the circle is moved in the background field of lines, then the answer to Bertrand’s problem should not change. However, philosopher Louis Marinoff correctly pointed out that by imposing the translational invariance requirement is changing the original Bertrand problem [
29]. Translational symmetry does not mean that the distribution of chords meets the randomness requirement. Translational invariance just means that the preference for producing long chords is stable in regard to the movement of the circle in a field of lines.
Jaynes, who was a theoretical physicist, also argued that the radial method is correct because of an empirical experiment of tossing straws [
22]. Jaynes argued if the straw missed the circle, then it was disregarded, but if it overlapped the circle, then that determines a chord. He claimed the distribution of chord lengths was consistent with the radial method distribution function. Based on tossing 128 straws, Jaynes reported that a chi-squared statistical test of the null hypothesis was consistent with the distribution shown in Equation (2). This experiment is not convincing for many reasons. First, the goodness-of-fit test assumed the radial method distribution as the null hypothesis, and any frequentist statistical test cannot
prove the hypothesis that was assumed in the first place. Second, the analysis by Porto and associates [
30] demonstrated that the length of straws relative to the radius of the circle dramatically influences the answer to Bertrand’s question. Furthermore, other physicists [
31] argued that there can be many different stochastic processes for tossing straws that have a different answer to Bertrand’s problem. These physicists further suggest that the solution to Bertrand’s problem is to compute the arithmetic mean of the distinctly different answers to Bertrand’s problem. However, it is more reasonable to reject the whole idea of trying to answer Bertrand’s problem with any experiment where the outcomes can be widely variable. Bertrand’s problem is about a mathematical operation of constructing chords. It is not about an actual stochastic process that is occurring in nature. Real stochastic processes do occur in nature, but these processes do not necessarily reflect pure randomness.
Holbrook and Kim [
32] described a physics thought experiment that does not require actual data to obtain a probability. Their thought experiment consisted of arranging a circular cloud-chamber detector perpendicular to the direction of cosmic rays. The chord is the path of the ray through the detector. For an ideal detector of radius 1, the chord path is greater than
when it is within
of the center. This thought experiment is implementing the same operations considered when the radial method was discussed previously, and it arrived at the same answer of
as was found for the radial method. Consequently, the feature of cosmic rays and the cloud chamber are not needed to arrive at the probability of Bertrand’s question. Thus, the Holbrook and Kim thought experiment is not convincing. Moreover, Ardakani and Wulff [
33] described a different novel method for generating chords, and this method resulted in the answer of
to Bertrand’s problem—a value that is consistent with Bertrand’s angular-separation method. Consequently, neither the Holbrook-Kim method nor the Ardakani-Wulff method resolves Bertrand’s paradox. Bertrand already showed that different stochastic processes result in different answers to their problem.
Kaushik [
34] developed a stochastic process that resulted in the conclusion that
. Recall that
was the answer for Bertrand’s within-disk method, which was previously rejected for reasons discussed in
Section 2.2, and it was also rejected as a valid stochastic process by Shackel [
15]. However, the arguments raised against the within-disk method do not apply to the process proposed by Kaushik, so this stochastic procedure should be examined more carefully. With the Kaushik procedure, each point along the diameter of the unit circle is uniformly sampled. For simplicity, let the sampled point be a distance
t from the point
along the path to
. At the point
, a perpendicular line is drawn to the circle. The chord endpoints are
and
. Kaushik points out that for each value for
t, there is a corresponding chord of length
. Conversely, for each possible chord of length
L, there is a corresponding distance
t. But the problem with the Kaushik solution is that
t is sampled from a uniform distribution on the
interval, which results in an
informative distribution for chord length. This fact is demonstrated by examining the odds ratio between the hypotheses of
and
, which is 1 to 3 instead of 1 to 1. To obtain the flat chord length density function
for
and given the general transformation formula from Equation (10), we should set
and
. Thus, the Jacobian is
, and it follows from Equation (10) that
Thus, , so , which is the correct answer advanced in this paper to Bertrand’s problem. From Equation (17) the effective density function for t should be . However, if , then , which results in the incorrect answer of to Bertrand’s problem. The Kaushik analysis, like the other Bertrand methods, fails to employ the proper density function that is consistent with the uniform distribution for chord length. Instead, Kaushik sampled from a uniform distribution on a secondary geometric feature that is nonlinearly linked to chord length.
While most papers dealt with the classic solutions discussed by Bertrand, several investigators proposed alternative stochastic processes for generating chords that result in different answers to Bertrand’s problem [
29,
35,
36]. Chiu and Larson [
35] discussed five alternative answers to Bertrand’s problem, but none of these alternatives had a density function that was uniform for chord length. Jevremovic and Obradovic [
36] used Monte Carlo simulations to evaluate three alternative methods to ascertain the probability that random chords exceeded
. However, none of these three methods came remotely close to the value of
, which occurs when the density function on chord length is uniform. The smallest value found by these investigators from their Monte Carlo simulations was about
, so these procedures also had a strong preference towards generating long chords. These papers illustrate that there are many more than the original three stochastic processes examined by Bertrand for generating an informative distribution of chords. In each case, the researchers imposed a uniform distribution on a geometric variable that is nonlinearly linked to chord length.
Marinoff [
29] also entertained several solutions to Bertrand’s problem. One of the solutions had the answer of
, which is the same value arrived at in this paper. However, Marinoff did not argue that this random process had any special status. Marinoff also did not discuss the uniqueness of this solution in terms of the maximum entropy for chord length nor did he discuss the role of the Jacobian as the underlying reason for why Bertrand’s paradox occurs. Moreover, there has not been a subsequent discussion of this stochastic process in the literature since it was published in 1994.
In light of this review of the literature on Bertrand’s paradox, it is interesting to observe that this problem is deceptive, and it has resisted resolution for a long time. What appears to mislead most theorists is the importance of the Jacobian for Bertrand’s problem. As noted above with the Kaushik stochastic process, there is clearly a one-to-one linkage between any possible chord of length L and the corresponding distance t that produced that length. Both t and L span the interval of . Yet despite the one-to-one correspondence between the variates, which both have an uncountably infinite number of possible values, it is still necessary to account for the inequality of the and differentials. The error was using a maximum entropy, uniform distribution for the t variable rather than for the chord length L. It is fine to generate chords with the Kaushik stochastic process provided that the density function for sampling t is linked via the Jacobian to the uniform distribution of chord length as shown in Equation (17). There is one correct answer to Bertrand’s problem, but it was apparently not clear to either Bertrand or others why a purely random chord generation process must generate chords that have a uniform distribution over the possible lengths for the chord. Yet Bertrand was correct to stress the important difference between sample spaces that have a finite number of elementary outcomes and sample spaces that have an infinite number of elementary outcomes. The differential and the Jacobian are only used for continuous variables. For example, suppose the sample space for a random variable n consists of the integers , and we are interested in a nonlinear transformation such as . The n and m sample spaces are finite, and probabilistic statements do not involve calculus and the Jacobian. A uniform distribution for n is the discrete uniform over the integers , and the corresponding distribution for m is the discrete uniform distribution over the 100 squares . The probability of a value less than k in the sample space for n is equal to the probability of a value less than in the sample space for m.