On Surprise Indices Related to Univariate Discrete and Continuous Distributions: A Survey
Abstract
:1. Introduction
- We revisit the computation of the SIs for the binomial, Poisson, and negative binomial distributions and provide the correct expression of the SI for the Poisson distribution using Mathematica.
- Surprise indices are computed for the geometric and negative binomial, zero-truncated Poisson, and Hermite (for which closed-form expressions involving special functions and/or infinite series are available) distributions, while for the generalized Poisson distribution, the associated SI is not available in closed form, and a numerical solution is to be searched for. All of these derivations are new contributions to this topic.
- In addition, we provide the derivation of SIs for univariate continuous probability models using an analogous expression based on the geometric mean of a random variable.
- Finally, we conduct empirical studies on SIs for several of the discrete distributions with varying parameter choices, and several useful observations are derived accordingly.
2. Surprise Index Derivation: Preliminaries
- Step 1: Calculate the generating function of which is of the form from a given probability mass function (p.m.f.).
- Step 2: Set to obtain the following quantity which is the numerator of Equation (1), where .
- Step 3: Integrate the simplified quantity on the R.H.S. obtained in Step 2, from 0 to .
2.1. Surprise Index for a Binomial Distribution
- For fixed with decreasing, the corresponding SI values increase, which is expected.
- For fixed values of p and as the number of successes increase and with decreasing, the SI values increase.
2.2. Surprise Index for a Negative Binomial Distribution
- The SI values are dependent on the magnitude of either or both of p and .
- For fixed as increases, the SI values decrease for varying
- For with q increasing, the SI value increases.
- For with and q decreasing, as m decreases, the SI values increase.
2.3. Surprise Index for a Poisson Distribution
- For a fixed with m increasing and decreasing, the SI values increase.
- For a fixed with increasing, the SI values decrease.
2.4. Surprise Index for a Zero-Truncated Poisson Distribution
- The SI values are slightly different from the Poisson distribution’s SI values. Also, we see that smaller values of generate greater differences between the zero-truncated Poisson and the Poisson SI values.
- The behavior/changing pattern of the SI values are exactly the same (except for the magnitude) as in the previous case (Poisson distribution), for varying choices of m and .
2.5. Surprise Index for a Geometric Distribution
- For fixed with and with m increasing, the SI values exhibit an increasing pattern.
- For fixed with q decreasing, the SI values increase.
2.6. Surprise Index for a Hermite Distribution
2.7. Surprise Index for a Skellam Distribution
2.8. Surprise Index for a Generalized Poisson Distribution
3. Surprise Index for Continuous Probability Models
- For uniform and b increasing and a decreasing, the SI will increase.
- For Beta as a increases and b increases, SI decreases. On the other hand, when both a and b increase, the SI increases.
- For Beta (type-II) , when both increase, the SI will increase.
- For Pareto (type-II) distribution, because of the nature of the polygamma function as obtained from Mathematica, for any choices of the parameter regardless of the other permissible choices of the other two parameters, it is divergent and, therefore, it cannot be computed.
- For the Log-normal distribution, as both and increase, the associated SI increases.
- For the Gamma distribution—(i) when is fixed, with increasing, the SI will increase and (ii) with fixed and increasing, the SI will increase.
- For the Weibull distribution, the following can be observed:
- –
- For a fixed k as and increase, the SI will increase.
- –
- For a fixed as k and increase, the SI will increase.
- –
- For any choice of and decreasing with k increasing, for a fixed choice of the corresponding SI will decrease.
- Conjecture 1. The SI, if available, uniquely determines a discrete and/or continuous probability distribution.
- Conjecture 2. The SI for a truncated model differs only by a scalar quantity (involving model parameter(s)) corresponding to the non-truncated version of the assumed discrete probability model and is bigger than the SI computed for the non-truncated version. For example, the authors of [6] have shown that the SI for the truncated Poisson is bigger than that for the usual Poisson distribution.
- Conjecture 3. The SI is invariant under all non-singular linear transformations. Equivalently, we can state the following. Let X and Y be two non-degenerate random variables with valid probability distributions that are well-defined on Further, let with and and let and be the surprise indices for the r.v. X and Y, respectively. Then,Proof.The result follows immediately by using the invariance property of a generating function. We provide the proof for a discrete r.v.; however, a similar approach can be made to establish the result for a continuous r.v. If and are the probability generating functions of X and Y, respectively, thenHence, the proof. □
4. Potential Applications and Challenges/Open Problems
- (i)
- Ref. [2] states, “for multivariate normal distributions, the distribution of the likelihood density, does not seem to be expressible in elementary terms” (p. 1133);
- (ii)
- The special functions are difficult to determine for the univariate case, which leads to even more difficulty when more variables are considered;
- (iii)
- The long runtimes when finding the closed-form expressions for several of such distributions suggest that a multivariate analysis of the SI will require highly efficient computing environments.
5. Concluding Remarks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
- Binomial distribution (Equation (3) numerator) .
- Poisson distribution (Equation (7) numerator)
- Negative binomial distribution (Equation (6) numerator numerator)
- Geometric distribution (Equation (5) numerator)
- Pareto (type II) distribution (Table 6, row 4)
- For a two parameter beta distribution (Table 6, row 2)
Appendix B
- Observations from Figure A1:
- For as m increases, the value increases, i.e., equivalently, the SI values increase.
- For as m increases, the value decreases, i.e., equivalently, the SI values decrease.
- Observations from Figure A2: For all fixed choices of as m increases, the value increases, i.e., equivalently, the SI values increase.
- Observations from Figure A3: For all fixed choices of as m increases, the value increases, i.e., equivalently, the SI values increase; however, the magnitude of increment decreases as becomes larger.
- Observations from Figure A5:
- When takes a constant value for all choices
- For as m increases, the value increases, i.e., equivalently, the SI values increase.
References
- Weaver, W. Probability, rarity, interest, and surprise. Sci. Mon. 1948, 67, 390–392. [Google Scholar] [CrossRef]
- Good, I.J. The surprise index for the multivariate normal distribution. Ann. Math. Stat. 1956, 27, 1130–1135. [Google Scholar] [CrossRef]
- Redheffer, R.M. A note on the surprise index. Ann. Math. Stat. 1951, 22, 128–130. [Google Scholar] [CrossRef]
- Borja, M.C. Outliers in Long-Tailed Discrete Data. 2012. Available online: https://web-archive.lshtm.ac.uk/csm.lshtm.ac.uk/wp-content/uploads/sites/6/2016/04/Mario-Cortina-Borja-16-11-2012.pdf (accessed on 16 June 2023).
- Scotti, C. Surprise and uncertainty indexes: Real-time aggregation of real-activity macro-surprises. J. Monet. Econ. 2016, 82, 1–19. [Google Scholar] [CrossRef] [Green Version]
- David, F.N.; Johnson, N.L. The truncated poisson. Biometrics 1952, 8, 275–285. [Google Scholar] [CrossRef]
- Kemp, C.D.; Kemp, A.W. Some properties of the ‘Hermite’ distribution. Biometrika 1965, 52, 381–394. [Google Scholar] [PubMed]
- Kumar, S.C.; Ramachandran, R. On some aspects of a zero-inflated overdispersed model and its applications. J. Appl. Stat. 2020, 47, 506–523. [Google Scholar] [CrossRef] [PubMed]
- Moriña, D.; Higueras, M.; Puig, P.; Oliveira Pérez, M. Generalized Hermite Distribution Modelling with the R Package Hermite. 2015. Available online: https://journal.r-project.org/archive/2015/RJ-2015-035/index.html (accessed on 22 June 2023).
- Sellers, K.F. A distribution describing differences in count data containing common dispersion levels. Adv. Appl. Stat. Sci. 2012, 7, 35–46. [Google Scholar]
- Vernic, R. A multivariate generalization of the generalized Poisson distribution. ASTIN Bull. J. IAA 2000, 30, 57–67. [Google Scholar] [CrossRef]
n | m | p | q | ||
---|---|---|---|---|---|
10 | 1 | 0.01 | 0.99 | 0.0914 | 9.04 |
10 | 3 | 0.01 | 0.99 | 0.0001 | 7387.44 |
10 | 5 | 0.01 | 0.99 | 34,478,242.41 | |
10 | 8 | 0.01 | 0.99 | ||
10 | 10 | 0.01 | 0.99 | ||
10 | 1 | 0.25 | 0.75 | 0.1877 | 1.09 |
10 | 3 | 0.25 | 0.75 | 0.2503 | 0.82 |
10 | 5 | 0.25 | 0.75 | 0.0584 | 3.52 |
10 | 8 | 0.25 | 0.75 | 0.0004 | 531.61 |
10 | 10 | 0.25 | 0.75 | 0.000001 | 215,301.13 |
10 | 1 | 0.8 | 0.2 | 0.000004 | 54,639.75 |
10 | 3 | 0.8 | 0.2 | 0.0008 | 284.58 |
10 | 5 | 0.8 | 0.2 | 0.0264 | 8.47 |
10 | 8 | 0.8 | 0.2 | 0.30199 | 0.74 |
10 | 10 | 0.8 | 0.2 | 0.1074 | 2.08 |
n | m | p | q | ||
---|---|---|---|---|---|
1 | 9 | 0.01 | 0.99 | 0.0091 | 0.55 |
3 | 7 | 0.01 | 0.99 | 0.00003 | 5,616,123,374.28 |
5 | 5 | 0.01 | 0.99 | 0.00000001 | |
8 | 2 | 0.01 | 0.99 | ||
10 | 0 | 0.01 | 0.99 | ||
1 | 9 | 0.25 | 0.75 | 0.0188 | 7.61 |
3 | 7 | 0.25 | 0.75 | 0.0751 | 185.21 |
5 | 5 | 0.25 | 0.75 | 0.0292 | 88,714.07 |
8 | 2 | 0.25 | 0.75 | 0.0003 | |
10 | 0 | 0.25 | 0.75 | 0.000001 | |
1 | 9 | 0.5 | 0.5 | 0.0010 | 341.33 |
3 | 7 | 0.5 | 0.5 | 0.0352 | 61.81 |
5 | 5 | 0.5 | 0.5 | 0.1230 | 203.05 |
8 | 2 | 0.5 | 0.5 | 0.0352 | 34,684.81 |
10 | 0 | 0.5 | 0.5 | 0.0010 | 17,668,300.52 |
m | |||
---|---|---|---|
0.5 | 1 | 0.3033 | 1.54 |
0.5 | 3 | 0.0126 | 36.86 |
0.5 | 5 | 0.0002 | 2948.77 |
0.5 | 8 | 0.00000006 | 7,926,282.59 |
0.5 | 10 | 2,853,461,732.66 | |
1 | 1 | 0.3679 | 0.84 |
1 | 3 | 0.0613 | 5.03 |
1 | 5 | 0.0031 | 100.63 |
1 | 8 | 9,123,994.08 | 33,812.86 |
1 | 10 | 0.0000001 | 3,043,157.28 |
2.5 | 1 | 0.2052 | 0.89 |
2.5 | 3 | 0.2138 | 0.86 |
2.5 | 5 | 0.0668 | 2.75 |
2.5 | 8 | 0.0031 | 59.08 |
2.5 | 10 | 0.0002 | 850.81 |
m | |||
---|---|---|---|
0.5 | 1 | 0.7707 | 0.79 |
0.5 | 3 | 0.0321 | 19.05 |
0.5 | 5 | 0.0004 | 1523.94 |
0.5 | 8 | 0.0000001 | 4,096,347.51 |
0.5 | 10 | 1,474,685,102.05 | |
1 | 1 | 0.5820 | 0.69 |
1 | 3 | 0.0970 | 4.14 |
1 | 5 | 0.0048 | 82.89 |
1 | 8 | 0.00001 | 27,850.21 |
1 | 10 | 0.0000002 | 2,506,518.52 |
2.5 | 1 | 0.2236 | 0.89 |
2.5 | 3 | 0.2329 | 0.85 |
2.5 | 5 | 0.0728 | 2.73 |
2.5 | 8 | 0.0034 | 58.65 |
2.5 | 10 | 0.0002 | 844.61 |
m | p | q | ||
---|---|---|---|---|
1 | 0.01 | 0.99 | 0.01 | 0.51 |
5 | 0.01 | 0.99 | 0.0096 | 0.53 |
10 | 0.01 | 0.99 | 0.00914 | 0.56 |
20 | 0.01 | 0.99 | 0.0083 | 0.62 |
50 | 0.01 | 0.99 | 0.0061 | 0.84 |
1 | 0.25 | 0.75 | 0.25 | 1.02 |
5 | 0.25 | 0.75 | 0.0791 | 3.21 |
10 | 0.25 | 0.75 | 0.0188 | 13.53 |
20 | 0.25 | 0.75 | 0.0011 | 240.26 |
50 | 0.25 | 0.75 | 0.0000002 | 1,345,356.92 |
1 | 0.8 | 0.2 | 0.8 | 20.83 |
5 | 0.8 | 0.2 | 0.0013 | 13,020.83 |
10 | 0.8 | 0.2 | 0.0000004 | 40,690,104.16 |
20 | 0.8 | 0.2 | ||
50 | 0.8 | 0.2 |
Distribution | Surprise Index |
---|---|
Uniform () | |
Beta() | |
Beta (type-II)() | |
Pareto (type-II) | |
Gamma | |
Weibull | |
Log-normal | |
Exponentiated-exponential |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghosh, I.; Cooper, T.D.H. On Surprise Indices Related to Univariate Discrete and Continuous Distributions: A Survey. Mathematics 2023, 11, 3234. https://doi.org/10.3390/math11143234
Ghosh I, Cooper TDH. On Surprise Indices Related to Univariate Discrete and Continuous Distributions: A Survey. Mathematics. 2023; 11(14):3234. https://doi.org/10.3390/math11143234
Chicago/Turabian StyleGhosh, Indranil, and Tamara D. H. Cooper. 2023. "On Surprise Indices Related to Univariate Discrete and Continuous Distributions: A Survey" Mathematics 11, no. 14: 3234. https://doi.org/10.3390/math11143234
APA StyleGhosh, I., & Cooper, T. D. H. (2023). On Surprise Indices Related to Univariate Discrete and Continuous Distributions: A Survey. Mathematics, 11(14), 3234. https://doi.org/10.3390/math11143234