There was an error in the original article [
1]. Section 8 of the paper uses a simple toy example to illustrate Principal Component Analysis (PCA) of compositional data, using the additive (alr) and centred (clr) logratio transformations. It explains that the clr transformation is preferred because it produces more easily interpretable biplots. However, it failed to mention that, even though the alr- and clr-based PCA configurations of the toy example look similar, they are not the same. This is because the alr transformation is not isometric. Distances in alr-space depend on the choice of denominator, which spells trouble for PCA. As explained in Section 7 of the paper, PCA is a special case of Multidimensional Scaling (MDS). MDS is based on dissimilarity matrices, so if distances are not well defined, then neither are the MDS configuration and, hence, the principal components. The clr transformation fixes this issue.
A correction has been made to Section 8 and Figure 5 to clarify this point. It replaces the text from “Consider the following” to “in this context.” with:
Consider the following trivariate (a, b and c) dataset of three (1, 2 and 3) compositions that are constrained to a constant sum (a
i + b
i + c
i = 100% for 1 ≤ i ≤ 3,
Figure 5):
It would be wrong to apply conventional PCA to this dataset, because this would ignore the constant sum constraint. As was discussed in Section 6, PCA begins by ‘centering’ the data via the arithmetic mean. Section 3 showed that this yields incorrect results for compositional data. Although the additive logratio transformation (alr) of Equation (1) solves the closure problem, it is not suitable for PCA because it is not isometric. For example, the alr-distance between samples 2 and 3 is 1.74 if b is used as a common denominator, but 2.46 if c is used as a common denominator.
The fact that distances are not unequivocally defined in alr-space spells trouble for PCA. Recall the equivalence of PCA and classical MDS, which was discussed in Section 7. MDS is based on dissimilarity matrices, so if distances are not well defined then neither are the MDS configuration and, hence, the principal components. This issue can be solved by the
centred logratio transformation (clr):
where
gi is the geometric mean of the ith sample:
Applying the clr-transformation to the data of Equation (12) yields a new trivariate dataset:
where
g stands for the geometric mean of each row. Note that each of the rows of
Xc adds up to zero. Thanks to the symmetry of the clr-coordinates, the distances between the rows (which are also known as Aitchison distances) are well defined. Subjecting Equation (14) to the same matrix decomposition as Equation (8) yields:
so that
Note that, even though this yields three principal components instead two, the variance of the third component in matrix V is zero. Therefore, all the information is contained in the first two components. Furthermore, note that the first two principal components of the compositional dataset are identical to those of the PCA example shown in Section 6 (Equation (9)). This is, of course, intentional.
A second correction was made to page 6, where “geometric mean” should be replaced with “logratio mean”. The corrected paragraph appears below:
“5. Compute the logratio mean composition and add it to the existing ternary diagram as a red square:”
“Figure 1. Graphical output of Section 3. Black circles mark 20 synthetic Al2O3, (CaO + Na2O) and K2O compositions, drawn from a logistic normal distribution. The blue square marks the arithmetic mean, which falls outside the data cloud. The blue polygon marks a 2- confidence polygon, which plots outside the ternary diagram, in physically impossible negative space. The red square represents the logratio mean, which firmly plots inside the data cloud. The red confidence envelope marks a 95% confidence region calculated using Aitchison’s logratio approach. This confidence envelope neatly fits inside the ternary diagram and tightly hugs the data.”
The author states that the scientific conclusions of the remainder of the paper are unaffected. This correction was approved by the Academic Editor. The original publication has also been updated.