2.1. Image Analysis Using PCA
Figure 3 displays NIR spectra of both leaf (a) and root (b) powders for the three
Echinacea species. The spectral patterns do not show distinctive features that can be used to differentiate the species and hence the need for PCA which provides visual plots such as score images and scatter plots to observe clearer differences.
Figure 3.
NIR imaging spectra of (a) Echinacea leaves and (b) roots showing similar spectral patterns of the three species.
Figure 3.
NIR imaging spectra of (a) Echinacea leaves and (b) roots showing similar spectral patterns of the three species.
The interactive score image and scatter plots were coloured according to the score values of PCs. The score image is an amplitude plot where similar colours represent similar score values and
vice-versa. Principal component analysis of the root and leaf image showed a clear separation of
E. pallida from the other two species based on the colour amplitude, as observed in the PCA score image of the first PC (t
1) after mean centering and standard normal variate (SNV) correction was applied to the data (
Figure 4a). Replicate
E. pallida root samples (EPaR) showed a consistently distinct high colour amplitude (yellow-red) compared to the other species, evidence that its chemical profile differed significantly from the other species. As observed for the root powders,
E. pallida leaf (EPaL) also showed a higher colour profile (light blue) compared to leaves of
E. angustifolia (EAL) and
E. purpurea (EPL) that had low amplitude of -5 (dark blue) (
Figure 4a).
Figure 4.
PCA score image (t1) of Echinacea root and leaf powders showing distinction of the three species based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC2) shows distinct pixel clusters coloured according to score values that correspond to the images (b). (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf, EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
Figure 4.
PCA score image (t1) of Echinacea root and leaf powders showing distinction of the three species based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC2) shows distinct pixel clusters coloured according to score values that correspond to the images (b). (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf, EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
The corresponding scatter plot shows pixels coloured according to score values which correlates to the amplitudes scale in the score image. Using PCA, chemical variation was observed within the HSI data which revealed 6 pixel clusters with each cluster representing plant parts of different species (
Figure 4b). The cumulative chemical variation modeled using three principal components (PCs) was 97.2% (R
2X
cum = 0.972). The variation of 90.5% along PC1 was responsible for separating mainly the root powders where
E. pallida (EPaR) was shown to be the most distinct with its pixel cluster (highest score value-red yellow) falling on the far positive PC1. The
E. angustifolia root cluster (EAR) showed pixels with lower score values spanning Y = 0 region, while
E. purpurea root pixels (EPR) occupied negative PC1 (
Figure 4b). The observation supports results of the score image where maximum variance modeled along PC1 was shown to differentiate mainly the root samples. Distinction of the leaf samples was observed along PC2 in the scatter plot; however, the variation appears minimal (4.47%) since the presence of root samples influenced variation within the model.
To obtain a clearer picture of the chemical distinction of the root samples, it was necessary to model these separately from the leaves to eliminate the influence of the leaf chemistry on the model.
Figure 5a is a score image of PC1 which showed three distinct colour amplitudes, each representing a different species. As observed in
Figure 4a,
E. pallida displayed the highest colour amplitude (orange-yellow) while
E. angustifolia (light blue) and
E. purpurea (dark blue) showed lower amplitudes that differed slightly. The corresponding scatter 2D density plot demonstrated a clear separation of the three pixel clusters along PC1 with 94.5% chemical variation in the data cube attributed to the distinction of the three species (
Figure 5b).
Figure 5.
PCA score image (t1) of Echinacea root powders showing distinction of the three species based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC2) shows distinct pixel clusters coloured according to score values that correspond to the images (b). (EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
Figure 5.
PCA score image (t1) of Echinacea root powders showing distinction of the three species based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC2) shows distinct pixel clusters coloured according to score values that correspond to the images (b). (EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
The distribution of the pixel clusters along PC1 was consistent with
Figure 4b, only clearer. The results for the root samples demonstrate that the three
Echinacea species have distinct chemical profiles which was easily identified using NIR chemical imaging. To analyse the leaf data clearly, the leaf samples were modeled separately to exclude the influence from the roots.
Figure 6a shows the score image where
E. pallida displayed colour distinction (light blue-yellow) while similarities between
E. angustifolia and
E. purpurea leaves (dark blue) were evident. The corresponding scatter 2D plot of pixels demonstrates separation of the
E. pallida cluster from the other two species along PC1. The majority of the variance was captured in the first two PCs with PC1 accounting for 82.6% of the data and 3.94% in PC2 (
Figure 6b). It is clear from the plots that the leaf chemistry of the three species is less varied as demonstrated by the variance parameters. However,
E. pallida demonstrated chemical distinction compared to the other species as identified by HSI and multivariate data analysis techniques. To investigate the differences observed in the score images of the leaves and roots, loadings line plots of the first vector (P1) were constructed for the leaf model (
Figure 7a) and the roots model (
Figure 7b). The loadings plots show the region between 1937–2400 nm as carrying discriminating information of the three species based on both the leaf and the root chemistry. In both cases, positive (peaks) and negative loadings (troughs) were recorded in this region and these regions could be further investigated for molecular signals that can be assigned to specific plant species.
Figure 6.
PCA score image (t1) of Echinacea leaf powders based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC3) shows minimal separation of the pixel clusters (b). (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf).
Figure 6.
PCA score image (t1) of Echinacea leaf powders based on colour amplitudes (a). The corresponding score plot (PC1 vs. PC3) shows minimal separation of the pixel clusters (b). (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf).
Figure 7.
Loadings line plot of vector P1 for the leaf score image (a) and the root score image (b) showing variables responsible for separation of the three species.
Figure 7.
Loadings line plot of vector P1 for the leaf score image (a) and the root score image (b) showing variables responsible for separation of the three species.
Based on these observations, it is evident that the use of NIR chemical imaging for the qualitative differentiation of
Echinacea species presents a promising visual technique in the quality control of the raw material. The root chemistry of the species presents a better choice for analysis; however, the minor chemical differences between leaf samples can also be detected using the imaging tool. Previous reports on the chemical profiling of
Echinacea species have reported marked differences between
E. angustifolia and
E. pallida that were previously regarded as varieties of the same species until they were taxonomically revised in 1968 [
12]. In this study,
E. pallida has demonstrated a distinct chemical profile that could be observed in both root and leaf samples.
Figure 8.
PLS-DA Y-image showing calibration samples used to build the model and test samples that were excluded for model validation (a). The predictions show test samples assigned to classes that correspond to the species imaged (b) (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf, EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
Figure 8.
PLS-DA Y-image showing calibration samples used to build the model and test samples that were excluded for model validation (a). The predictions show test samples assigned to classes that correspond to the species imaged (b) (EAL—E. angustifolia leaf, EPL—E. purpurea leaf, EPaL—E. pallida leaf, EAR—E. angustifolia root, EPR—E. purpurea root, EPaR—E. pallida root).
Having successfully established that HSI can distinguish between the three
Echinacea species, the next objective was to develop a PLS-DA model to predict the identity of commercial
Echinacea products introduced into the model as an external dataset. Since many
Echinacea products are prepared using both the leaf and root materials, a model that included both the leaves and the roots was constructed. Ultimately the model predictions would then identify the species present but not specify whether the material used was the root, leaf or both.
Figure 8a is a PLS 1 Y-image of the PLS-DA model where the calibration set showed each species (class) assigned a different colour while the background or some unknown areas were classified as “no class” implying that the sample/region within the image could not be correlated to the modeled data. The test set samples (grey-scale) were excluded for external validation of the model where the correct identities were known. Introducing the external test set into the PLS-DA calibration model provided results shown in
Figure 8b. The predicted classes of the test samples were in agreement with prior knowledge on identities of the samples based on the colour coding. There were however regions that were misclassified within the samples which can be attributed to chemical similarities between the species and hence the overlap in the pixel data. The PLS-DA model however exhibited good model statistics with R
2X
cum of 0.980 and cumulative variation of Y (Q
2Y
cum) of 0.779 that could be predicted by three components which was subsequently used for class prediction of external samples.
2.2. Class Prediction of Commercial Echinacea Samples
Twenty commercial
Echinacea samples and four authentic
Echinacea raw material control samples captured as a single image were introduced into the PLS-DA model for class prediction.
Figure 9 shows the PLS-DA prediction image after matching the chemical profiles of the products to the authentic
Echinacea samples.
Figure 9.
PLS prediction images (t1) of Echinacea commercial products and authentic raw material showing variation in the chemical composition of Echinacea commercial products. The enlarged insert demonstrates the predicted levels of each species in product 16.
Figure 9.
PLS prediction images (t1) of Echinacea commercial products and authentic raw material showing variation in the chemical composition of Echinacea commercial products. The enlarged insert demonstrates the predicted levels of each species in product 16.
The predictions are represented by colour where E. angustifolia is represented in red, E. pallida in blue and E. purpurea in green. The results indicate that 12 out of 20 products were correctly classified (indicated with a) with the HSI class prediction matching the product label. The remaining eight products were misclassified (indicated with b) as the HSI prediction did not match the product label. Seven of the 20 products contained high levels of E. purpurea (1, 9, 12, 14, 17, 18 and 19) while five products contained high levels of E. angustifolia (3, 6, 10, 11 and 16). Five of the 20 products (2, 4, 7, 8, and 15) seemed to present a completely different profile to the powdered authentic Echinacea species and hence these were identified as no class (yellow). Upon investigation it was discovered that most of the samples predicted as ‘no class’ were either labeled as extracts or concentrates of Echinacea and not unprocessed raw material. As chemical processing alters the chemistry of the products, this presents chemical variation between the modeled and predicted samples and hence the products were not classified as Echinacea. In some images (5, 13 and 20) the ‘no class’ prediction was more prominent while trace amounts of one or two Echinacea species was detected. The observation can be explained by the presence of excipients such as magnesium stearate in the formulations in greater proportions compared to Echinacea raw material.
Multi-ingredient formulations containing other herbs such as garlic, parsley and goldenseal were also predicted as containing trace amounts of the Echinacea species while the majority of the constituents could not be classified (Product 20). The last sample row in the image containing authentic Echinacea raw material as a control demonstrates the accuracy of the model in predicting Echinacea raw material. The four samples (21, 22, 23 and 24) were correctly identified where the major species predicted to be dominant in the sample matched the species imaged (root and/or leaf samples). The potential of HSI in determining the presence or absence of Echinacea raw material in both commercial products and raw material samples has been demonstrated.
Table 1.
Results of the HSI classification analysis using the (PLS-DA) model in comparison to the product label.
Table 1.
Results of the HSI classification analysis using the (PLS-DA) model in comparison to the product label.
Product | Product Label Claim | HSI Species Predictions (%) |
---|
E. angustifolia | E. purpurea | E. pallida | No Class |
---|
1 | E. purpurea (herb, root) & E. angustifolia herb | 1.9 | 89.8 a | 0 | 8.3 |
2 | E. angustifolia root extract | 0 | 0 | 0.2 | 99.8 b |
3 | E. purpurea root & aerial parts | 89.3 a | 2.6 | 0.2 | 7.9 |
4 | E. purpurea Herba & Radix dry concentrate | 0 | 0 | 0 | 100 b |
5 | E. purpurea & E. angustifolia | 14.3 | 7.7 | 1.2 | 76.8 b |
6 | E. purpurea root & E. angustifolia root | 94.6 a | 5.4 | 0 | 0.1 |
7 | Echinacea purpurea extract | 0 | 0 | 0 | 100 b |
8 | E. purpurea Herba & E. purpurea Radix | 0 | 0 | 0 | 100 b |
9 | E. pallida | 0 | 72.2 a | 7.4 | 20.4 |
10 | E. purpurea root & E. angustifolia root | 86.7 a | 12.6 | 0 | 0.7 |
11 | E. angustifolia root | 99.5 a | 0 | 0.1 | 0.4 |
12 | E. purpurea herb | 0.2 | 73.2 a | 0.2 | 26.4 |
13 | E. pallida (outer label) & E. purpurea (inner label) | 1.3 | 22.9 | 2.4 | 73.3 b |
14 | Echinacea aerial powder & root powder | 9.2 | 90.1 a | 0 | 0.7 |
15 | E. angustifolia root & rhizome | 0 | 0 | 0 | 100 b |
16 | E. purpurea stem, leaf, flower | 61.8 a | 12.1 | 0 | 26.1 |
17 | E. purpurea root | 0.2 | 95.4 a | 0 | 4.4 |
18 | Echinacea blend (angustifolia, pallida, purpurea) | 0.1 | 92.4 a | 2 | 5.6 |
19 | Echinacea leaf powder & standardised extract | 4.9 | 88.5 a | 0 | 6.6 |
20 | Echinacea, Goldenseal, Elderberry, Garlic & Parsley | 16.3 | 8.1 | 9.4 | 66.3 b |
21 | E. purpurea leaf (authentic raw material) | 0.1 | 90.2 a | 6.5 | 3.3 |
22 | E. angustifolia root (authentic raw material) | 89.1 a | 3.5 | 0.4 | 7 |
23 | E. purpurea root (authentic raw material) | 0 | 97.5 a | 0 | 2.5 |
24 | E. angustifolia leaf (authentic raw material) | 64.2 a | 6.4 | 1.8 | 27.5 |
In addition to a qualitative assessment of the product composition, it was also possible to predict the percentage composition of the constituent species within a product by highlighting the species in the image and obtaining the corresponding prediction table as demonstrated in
Figure 9 (insert). The quantitative predictions are reported in
Table 1 together with a comparison of the product label.
Table 1 shows that most of the commercial products does contain
Echinacea raw material in high levels and most of the HSI results are in agreement with the label claim. However, HSI analysis detected the presence of more than one species in many of the products. According to
Table 1,
E. purpurea and
E. angustifolia seem to be the raw materials of choice in many products while
E. pallida is rarely used. Only two products were labeled as containing
E. pallida; however, according to the predictions,
E. purpurea was identified as occurring in higher proportions in both cases while the presence of
E. pallida was almost negligible (<10%).