Next Article in Journal
The Impact of Hydrothermal Fluids on Porosity Enhancement and Hydrocarbon Migration in Qamchuqa Formation, Lower Cretaceous, Kirkuk Oil Company
Previous Article in Journal
A Machine Learning Approach for Prediction of the Quantity of Mine Waste Rock Drainage in Areas with Spring Freshet
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Correction

Correction: Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193

University College London, Gower Street, London WC1E 6BT, UK
Minerals 2023, 13(3), 375; https://doi.org/10.3390/min13030375
Submission received: 14 February 2023 / Accepted: 21 February 2023 / Published: 8 March 2023
There was an error in the original article [1]. Section 8 of the paper uses a simple toy example to illustrate Principal Component Analysis (PCA) of compositional data, using the additive (alr) and centred (clr) logratio transformations. It explains that the clr transformation is preferred because it produces more easily interpretable biplots. However, it failed to mention that, even though the alr- and clr-based PCA configurations of the toy example look similar, they are not the same. This is because the alr transformation is not isometric. Distances in alr-space depend on the choice of denominator, which spells trouble for PCA. As explained in Section 7 of the paper, PCA is a special case of Multidimensional Scaling (MDS). MDS is based on dissimilarity matrices, so if distances are not well defined, then neither are the MDS configuration and, hence, the principal components. The clr transformation fixes this issue.
A correction has been made to Section 8 and Figure 5 to clarify this point. It replaces the text from “Consider the following” to “in this context.” with:
Consider the following trivariate (a, b and c) dataset of three (1, 2 and 3) compositions that are constrained to a constant sum (ai + bi + ci = 100% for 1 ≤ i ≤ 3, Figure 5):
X = 1 2 3 a 0.034 69.45 72.44 b 99.88 25.55 26.65 c 0.091 5.01 0.92
It would be wrong to apply conventional PCA to this dataset, because this would ignore the constant sum constraint. As was discussed in Section 6, PCA begins by ‘centering’ the data via the arithmetic mean. Section 3 showed that this yields incorrect results for compositional data. Although the additive logratio transformation (alr) of Equation (1) solves the closure problem, it is not suitable for PCA because it is not isometric. For example, the alr-distance between samples 2 and 3 is 1.74 if b is used as a common denominator, but 2.46 if c is used as a common denominator.
The fact that distances are not unequivocally defined in alr-space spells trouble for PCA. Recall the equivalence of PCA and classical MDS, which was discussed in Section 7. MDS is based on dissimilarity matrices, so if distances are not well defined then neither are the MDS configuration and, hence, the principal components. This issue can be solved by the centred logratio transformation (clr):
u i = ln   x i g i ,   v i = ln   y i g i ,   and   w i = ln   z i g i
where gi is the geometric mean of the ith sample:
g i = exp [ ln [ x i ] + ln [ y i ] + ln [ z i ] 3 ]
Applying the clr-transformation to the data of Equation (12) yields a new trivariate dataset:
X c = 1 2 3 ln ( a / g ) 3 1.21 1.79 ln ( b / g ) 5 0.21 0.79 ln ( c / g ) 2 1.42 2.58
where g stands for the geometric mean of each row. Note that each of the rows of Xc adds up to zero. Thanks to the symmetry of the clr-coordinates, the distances between the rows (which are also known as Aitchison distances) are well defined. Subjecting Equation (14) to the same matrix decomposition as Equation (8) yields:
X c = 1 1 1 0 2 2 + 1.15 0 0 0.58 1 0 0.58 1 0 3.67 0 0 0 0.71 0 0 0 0 0.71 0.71 0 0.71 0.71 0.82 0.58 0.58 0.58
so that
P = 4.24 0 0 2.12 0.71 0 2.12 0.71 0   and   L = 2.59 2.59 0 0.29 0.29 0.58 0 0 0
Note that, even though this yields three principal components instead two, the variance of the third component in matrix V is zero. Therefore, all the information is contained in the first two components. Furthermore, note that the first two principal components of the compositional dataset are identical to those of the PCA example shown in Section 6 (Equation (9)). This is, of course, intentional.
A second correction was made to page 6, where “geometric mean” should be replaced with “logratio mean”. The corrected paragraph appears below:
“5. Compute the logratio mean composition and add it to the existing ternary diagram as a red square:”
Figure 1. Graphical output of Section 3. Black circles mark 20 synthetic Al2O3, (CaO + Na2O) and K2O compositions, drawn from a logistic normal distribution. The blue square marks the arithmetic mean, which falls outside the data cloud. The blue polygon marks a 2- σ confidence polygon, which plots outside the ternary diagram, in physically impossible negative space. The red square represents the logratio mean, which firmly plots inside the data cloud. The red confidence envelope marks a 95% confidence region calculated using Aitchison’s logratio approach. This confidence envelope neatly fits inside the ternary diagram and tightly hugs the data.”
The author states that the scientific conclusions of the remainder of the paper are unaffected. This correction was approved by the Academic Editor. The original publication has also been updated.

Reference

  1. Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193. [Google Scholar] [CrossRef] [Green Version]
Figure 5. (i)—the compositional dataset of Equation (14) shown on a ternary diagram; (ii)—principal component biplot of the same data after centred logratio (clr) transformation.
Figure 5. (i)—the compositional dataset of Equation (14) shown on a ternary diagram; (ii)—principal component biplot of the same data after centred logratio (clr) transformation.
Minerals 13 00375 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vermeesch, P. Correction: Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193. Minerals 2023, 13, 375. https://doi.org/10.3390/min13030375

AMA Style

Vermeesch P. Correction: Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193. Minerals. 2023; 13(3):375. https://doi.org/10.3390/min13030375

Chicago/Turabian Style

Vermeesch, Pieter. 2023. "Correction: Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193" Minerals 13, no. 3: 375. https://doi.org/10.3390/min13030375

APA Style

Vermeesch, P. (2023). Correction: Vermeesch, P. Exploratory Analysis of Provenance Data Using R and the Provenance Package. Minerals 2019, 9, 193. Minerals, 13(3), 375. https://doi.org/10.3390/min13030375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop