Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review

Karadžić Banjac, Milica; Kovačević, Strahinja; Podunavac-Kuzmanović, Sanja

doi:10.3390/pr12030583

Open AccessReview

Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review

by

Milica Karadžić Banjac

,

Strahinja Kovačević

^*

and

Sanja Podunavac-Kuzmanović

Department of Applied and Engineering Chemistry, Faculty of Technology Novi Sad, University of Novi Sad, Bulevar cara Lazara 1, 21000 Novi Sad, Serbia

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(3), 583; https://doi.org/10.3390/pr12030583

Submission received: 3 January 2024 / Revised: 8 March 2024 / Accepted: 13 March 2024 / Published: 14 March 2024

(This article belongs to the Special Issue Chemometrics in Food Quality Control: New Challenges and Cutting-Edge Approaches)

Download Versions Notes

Abstract

:

In this review, papers published in the chemometrics field were selected in order to gather information and conduct a systematic review regarding food science and technology; more precisely, regarding the domain of bioactive compounds and the functional properties of foods. More than 50 papers covering different food samples, experimental techniques and chemometric techniques were selected and presented, focusing on the chemometric methods used and their outcomes. This study is one way to approach an overview of the current publications related to this subject matter. The application of the multivariate chemometrics approach to the study of bioactive compounds and the functional properties of foods can open up even more in coming years, since it is fast-growing and highly competitive research area.

Keywords:

food; chemometrics; regression; classification; non-parametric methods

1. Introduction

The expression “chemometrics” first appeared in the 1970’s and it was proposed by Swedish professor Svante Wold [1]. He defined the term chemometrics as “the art of extracting chemically relevant information from data produced in chemical experiments” [2]. Chemometrics is a chemical discipline that uses statistical and mathematical methods to perform objective data evaluation and extract meaningful information. The information being extracted from chemical data sets contains different related and unrelated data.

Since the food science and technology field is strongly linked with chemistry, analytical chemistry and analytical technologies are subsequently leading to chemometrics application, so this study is relevant and interesting to researchers. Nowadays, scientists all over the world deal with massive amounts of experimental data from the different measurements and devices they employ. Scientists started to deal with mathematical and statistical procedures in different scientific branches. The chemometric approach is an ongoing trend in various research fields such as medicine, pharmacy, agronomy, agriculture, biotechnology and biology since it can contribute to a better understanding of the data throughout their interpretation, presentation and visualization. The high attractiveness of chemometrics is reflected in research studies that deal with summing up the existing collaboration patterns that occur in the field of chemometrics [3] and bibliometric studies covering the use of chemometrics in food science and technology [4]. Authors have reported that, in food science and technology, the chemometric tools of choice are as follows: principal component analysis, partial least squares and discriminant analysis [4].

In this review, the Web of science and Science direct databases were used to collect bibliometric data. The main keywords used in search strategy in this paper were “chemometrics + food”. Food science and technology was selected as the subject category and the search covered research articles and review articles published from 2015 to 2024, together with some studies published earlier covering chemometric tools that are used less often. Additionally, a rising number of articles that are coming from non-pure chemometricians who use chemometric tools is being observed. Since the bioactive compounds and functional properties of foods is a fast-growing and highly competitive research area, we expect that the multivariate chemometrics approach application will draw more attention in the coming years.

2. The Advantages of Multivariate Chemometric Approaches

The benefits of a multivariate chemometrics approach to the study of the bioactive compounds and functional properties of foods are numerous. All regression, classification and non-parametric methods have found their use in the domain of food science and technology. Through the use of multivariate chemometric approaches, performance attributes such as accuracy, precision, robustness and reproducibility can be improved. The bioactive compounds and functional properties of foods can be estimated using data obtained from cheap and fast analytical methods (such as spectroscopy) while not relying on expensive and time-consuming analytical methods (such as high-performance liquid chromatography—HPLC). Using chemometrics, the correlations among the variables of interest can be utilized. Different chemometric approaches provide a wide angle for the observation and interpretation of experimentally observed data. Finding the correlations between and the prediction of different foods’ bioactive compounds content and various functional food properties’ experimental data also can be achieved using different chemometric methods. Along with tables and equations, different graphical representations of experimentally observed data can be used in order to better conjure up the most important aspects of research. If the current trend in research and publishing continues, there is plenty of space for chemometrics’ involvement in the interpretation and presentation of the results from studies on food science and technology. In this paper, only a small part of the various chemometrics approaches is presented, focusing on:

Regression methods: (1) Linear modeling (multiple linear regression, MLR; partial least squares regression, PLSR; linear discriminant analysis, LDA) and (2) Non-linear modeling (artificial neural networks, ANN);
Classification methods (principal component analysis, PCA; hierarchical cluster analysis, HCA);
And non-parametric methods (sum of ranking differences, SRD; generalized pair correlation method, GPCM).

3. Regression: Linear Modeling in Foods

In this section, the use of MLR, PLSR and LDA analyses regarding the application of chemometrics in food science and technology will be reviewed. In the domain of linear modeling, MLR stands out as one of the widely used and most exploited techniques. Researchers use MLR to correlate and predict different foods’ bioactive compounds contents and various functional food properties data. MLR quantitates the relationship between more than one independent variable and one dependent variable. The most important criterion that has to be respected is the absence of multicollinearity which is checked via the variance inflation factor (VIF) [5]. Together with MLR, PLSR and LDA are extensively used in research papers for the analysis and presentation of experimental data. PLSR is used for the construction of predictive models when many variables that are highly correlated are present. This type of regression is associated with regression techniques such as MLR and PCR (principal component regression). Regarding LDA, it aims to find the linear discriminant function (LDF) that takes into account the original variables. LDF takes into consideration original measurements for each object, reducing the data to one dimension which reveals the differences between the groups [6]. An overview of the different types of foods and modeling regarding linear modeling is given in Table 1.

Tea quality was estimated based on different experimentally gathered information from leaves and soil [7]. MLR was applied to predict the main components of the tea’s quality. Five optimal elements from tender and mature leaves were taken into account in order to construct the quality parameter estimation models. For the prediction of the soluble sugars, MLR performed well giving R² values from 0.5400 to 0.8400 [7].

MLR was used in studies that dealt with soft-bodied raw ewe’s milk cheese samples from six different dairy industries in two different seasons [8]. A low-frequency ultrasound was used for the quality control of analyzed samples. Stepwise regression modeling was used to explore the capacity of the ultrasonic parameters to predict the studied variables. The authors used MLR to predict the microbial, physicochemical, textural and sensory parameters of the observed cheeses [8].

Near-infrared spectroscopy (NIR) was coupled with chemometrics to assess the intact lemon fruits authentication and traceability [9]. A total of 119 lemon samples from two production years were collected from Italy and analyzed. MLR was used to quantify the relationship between the NIR spectra and the lemon quality properties such as peel chromaticity, thickness of pericarp, peel water, juice yield, soluble solid content and titratable acidity. Observed R² values ranged from 0.159 to 0.985, for both years and when data were merged. The authors outlined that many significant correlations were observed with lemon properties which suggests the NIR’s applicability to predicting the quality of lemons and lemon juices [9].

Gray relation analysis (GRA), correlation analysis (CA) and MLR were used to evaluate the relationships between components contents and the mechanical properties of maize kernels [10]. The interpretation of mechanical properties was carried out using scanning electron microscope (SEM). Ten maize varieties were collected and underwent mechanical and components content testing. MLR combines the effects of moisture, protein and starch contents on the obtained mechanical properties. All of the R² values were above 0.7 and the standard errors were less than the standard deviations. The results were validated, and they indicated that the moisture and starch contents had negative correlations with most of the mechanical properties, while the protein content had negative correlations with most of the mechanical properties. The authors proposed optimal MLR models for hardness, rupture force, rupture energy, apparent elastic modulus and viscoelastic parameters prediction [10].

PLSR and MLR were used to predict the age of 53 samples of commercial bottled dry red wine according to color parameters and pigments that were experimentally observed [11]. MLR models were constructed based on calibration sets comprising 35 wine samples and no more than four variables were included in each model. Both the MLR (with three variables) and the PLSR (with six variables) models presented were suitable for the accurate prediction of the age of the wine samples. The constructed models possessed high coefficients of determination and low enough root mean square errors of cross-validation. The prediction results from the presented MLR model were as accurate as those obtained using the PLSR model [11].

Since the adulteration of black pepper and cumin samples is common, there is a need for the development of method to detect adulteration. MLR and PLSR modeling were used together with near infrared spectroscopy for the rapid quantitative detection of black pepper and cumin adulterations [12]. The presented MLR and PLSR models possessed a high predictive capacity for the different types of single or complex starch adulterants, with good results for the statistical parameters: correlation coefficients higher than 0.9000 and root mean square errors that ranged from 2.2 to 7.0. The formed models were practically tested using samples of commercially available powdered spices [12].

Twenty-one adulterated sesame oil samples and four pure oil samples (sesame, canola, corn and sunflower) were used in a study that dealt with using Fourier transform infrared spectroscopy (FTIR) data for the development of method for the detection of adulteration [13]. All of the processing data collected were compared using PLSR. The authors employed the nonlinear iterative partial least squares algorithm (NIPALS) and kernel PLS which gave the same results up to four significant digits. Different preprocessing techniques were used: orthogonal signal correction (OSC), standard normal variate transformation (SNV) and extended multiplicative scatter correction (EMSC). The results covering sesame oil adulterated with canola, corn and sunflower oil indicated that all of the R² values for the calibration set were above 0.983 (except when using sunflower oil with three different types of preprocessing). The authors reported that about half of the preprocessing methods did not improve the RMSE of the PLSR model for the prediction of the level of corn oil adulteration [13].

The authors published the performance of 11 strains of non-Saccharomyces yeasts that produced polyphenol-enriched and fragrant kiwi wines [14]. In the 14 kiwi wines a total of 130 volatiles were detected. However, it has been concluded that some yeasts produce a higher concentration of volatile compounds than others. PLSR was applied to 15 aroma notes in order to expose the complex relationship between aroma characteristics and the overall acceptability of the examined kiwi wines. The accepted PLSR models had calibration R² values higher than 0.9500 and validation R² values higher than 0.8000. The results indicated that all the aroma descriptors used were closely related to the scores [14].

PLSR modeling was successfully applied for the prediction of intramuscular fat in lamb M. longissimus lumborum [15]. The intramuscular fat content of lamb meat is the most important factor in consumer acceptability. Hyperspectral imaging was used for in-line measurements of intramuscular fat in fresh meat and those data were used for PLSR modeling. Since fifteen trials consisting of eight independent flocks across 5 years were investigated, two models were developed: one comprising data from the first year of the trials and the progressive model (comprising data in chronological order). When the experimental conditions were consistent, the models performed similarly regarding statistical parameters, but under imaging conditions that were diversified, the progressive model was able to account for this variability, resulting in better parameters of statistical performance [15].

The chemometric approach and Raman spectroscopy were used for the classification of the vegetable oil samples [16]. The Raman spectra of 108 vegetable oil samples were recorded and used for PLSR modeling. The Raman spectra of 72 calibration samples modeled with reference values obtained from high-performance liquid chromatography were used and a PLSR model was established for the determination of the alpha-tocopherol content. The data obtained using both methods were highly correlated (R² > 0.9500). The proposed method could be used to distinguish between pure vegetable oil samples and adulterated ones [16].

PLSR was the chemometric method of choice in a research paper that dealt with monitoring the oxidation process of nut oil through Raman technology combined with PLSR and random forest PLSR (RF-PLSR) modeling [17]. Samples were from hazelnuts, cashew nuts, almonds, Hawaiian fruits, sunflower seeds, watermelon seeds, red pine seeds and peanuts. The peroxide index represents one of the most important characteristics of nut oils since they easily oxidize during the preservation. This study proposed a novel method for the determination of the peroxide index of nut oils based on PLSR and RF-PLSR. A total of 36 wavenumbers were selected and used for the PLSR modeling. The R² values for the correction and prediction sets for the PLSR and RF-PLSR models were 0.9552, 0.8672, 0.8048 and 0.7927. The root-mean-square errors of calibration and prediction were 0.067, 0.1100, 0.1514 and 0.1547, respectively [17].

The amino acid profiles obtained using HPLC were interpreted using chemometric analysis in order to detect fruit juice adulterations [18]. The authors applied chemometric methods to prove and confirm the authenticity of blood orange juice. The question was whether PLSR could be used for the quantification of the amount of blood orange juice in the blood orange juice samples. PLSR and PLS-DA were conducted, accounting for five latent variables, which resulted in statistically valid models (R² higher than 0.9510, root mean square error of calibration 9.6979 and root mean square error of cross-validation 13.1149). The authors suggested that PLSR could be a suitable approach to quantifying the amount in the case of fruit juice [18].

Another study [19] covering nut oils’ oxidation and PLSR was conducted and a model with a slightly lower R² value than the one in ref. [14] (Wang et al., 2021) was reported. Nut oils were extracted from hazelnuts, cashews, almonds, macadamia nuts, sunflower seeds, watermelon seeds, pine seeds and peanuts samples. For the experimental data collection, Fourier transform infrared spectroscopy was used, and based on these data, a PLSR model was established. After including the unknown sample, the prediction coefficients of determination were all above 0.9000. All statistical tests indicated the good predictive ability of the formed model. This indicates that this approach could achieve the rapid detection of oil oxidation indexes [19].

Data comprising the Raman spectra of intact tomatoes with various carotenoid concentrations were used to develop PLSR and PLS-DA models [20]. It was found that accuracy of the PLSR model was affected by the exposure time (0.7 and 10 s), while on the PLS-DA model, exposure time did not have any impact. When Raman spectra were acquired after 10 s, the accuracy of PLSR model was great (R² = 0.87) but decreased with decreasing exposure time (R² = 0.69, 0.7 s). The authors concluded that Raman spectroscopy combined with PLS-DA is very helpful for the analysis of carotenoids in fruits and vegetables [20].

A combination of PCA with LDA combined with PCA and support vector machine (SVM) for determining the geographical origin of coconuts in the coastal plantation in Indonesia was derived [21]. The examination of coconut endosperms from 13 districts was conducted using portable sensing device near infrared spectroscopy (PSD-NIRS). The performed LDA that used raw data and a single pre-processing procedure did not gain sufficient accuracy level. Then, the authors introduced PCA and PCA-LDA, which showed the maximum accuracy (100%) for the data. The combination of PCA and LDA can accurately forecast the sample group [21].

Six samples of high-value Italian chickpeas (Cicer arietinum L.) were characterized, and the content of different elements was determined using inductively coupled plasma optical emission spectrometry (ICPOES) [22]. The elements determined were as follows: Ca, K, P, Mg, Mo, Cu, Fe, Mn, Zn and Sr. The results were evaluated using ANOVA, LDA and soft independent modelling of class analogies (SIMCA). ANOVA pointed out the significance of the detected elements. The result of LDA modeling, both in calibration and validation, revealed that the proposed LDA model correctly assigned all of the chickpea samples to their geographical origin [22].

The linear discriminant analysis of the nutritional and physicochemical composition of 50 potato genotypes was carried out in [23]. The studied potato samples were from 24 different countries of origin and had four different flesh colors (purple, red, marble and yellow), as well as being different cultivation types. The authors carried out ANOVA analysis employing Tukey’s HSD or Tamhane’s T2 test in order to classify the statistical differences between the potato samples. Further, LDA was used to identify the variables that mostly characterized each potato cultivation type or flesh color. The purpose of the LDA was to describe the relationship between a dependent variable (cultivation type or flesh color) and the data set of independent variables (all determined parameters). The stepwise method was used for the significant independent variables’ selection. The first LDA had a classification performance with 100% accuracy for the original grouped cases, while for the cross-validated grouped ones, the accuracy was 99.20%. The second LDA’s performance was 100% for both the original grouped cases and for the cross-validated ones [23].

4. Regression: Non-Linear Modeling in Foods

An artificial intelligence technique that is widely used in food science and technology research publications is ANN, which is used to model non-linear correlations. ANN represents a mathematical model that imitates the way that the brain processes and stores information [24]. The construction of ANN consists of a learning or a training episode. Every ANN is composed of elements called artificial neurons that are mutually connected. These connections, or artificial synapses, are named weights and they are modified during data processing in order to obtain an output layer. The structure of an ANN model is determined by the number of layers and the number of nodes per layer. Several layers of neurons participate in ANN construction: input, hidden and output layers. The outcome of ANN modeling does not include the parametric equation; rather the network is described by the statistical parameters [25].

Some researchers dealt with shiitake mushrooms from which they extracted a β-glucan called lentinan, using natural deep eutectic solvents (NDES) [26]. Since the empirical and trial-and-error methods used for NDES selection are time-consuming, the researchers employed conductor-like screening models for realistic solvation (COSMO-RS). The extraction conditions were optimized and the effects of their interactions with lentinan content were analyzed using an artificial neural network coupled with a genetic algorithm (ANN-GA). For the analysis of lentinan extraction, a two-layer feed-forward network with sigmoid hidden neurons and linear output neurons was employed. The input layer consisted of three neurons, the hidden of ten and the output layer consisted of one neuron. The ANN model was trained with the Levenberg–Marquardt algorithm, while the fitness function was used to find the optimal value in the range of the limits of the extraction conditions. The authors reported that the combination of COSMO-RS and ANN-GA can be used for the solvent screening and the optimization of the extraction process [26].

The eating and cooking quality of rice, as well as the texture properties of cooked rice, were predicted using ANN [27]. The authors developed models using stepwise MLR, principal component analysis plus MLR, PLSR, k-nearest neighbor, random forest and gradient boosted decision tree with satisfying statistical parameters. After introducing ANN, the R² values were improved, ranging from 0.675 to 0.979, while the RMSE values ranged from 0.574 to 1.32. If using the textural properties, the ANN model ha an R² of 0.921 and an RMSE of 1.06, while combining it with rice components and/or pasting characteristics leads to R² values higher than 0.96 and RMSE values lower than 0.75. The authors concluded that rice textural properties are more suitable for ANN model formation [27].

Some researchers compared three statistical approaches, PLS-DA, classification and regression trees (CART) and ANN, for the authentication of 82 red wine samples based on their anthocyanin profile [28]. Two non-linear, layered, feed-forward networks were generated, multilayer perceptron (MLP) and radial basis function (RBF) neural networks, in order to obtain a statistically valid ANN. The variables that stood out regarding the ANN model for monovarietal wines’ authentication were: malvidin-3-acetylglucoside, petunidin-3-glucoside, malvidin-3-glucoside, peonidin-3-coumaroylglucoside and delphinidin-3-glucoside. The authors suggested that 6 out of 500 MLP artificial neural networks had high test, validation and training set accuracy. Proposed networks were generated using automatic network designer (AND) and the Broyden–Fletcher–Goldfarb–Shanno algorithm [28].

The quality of gamma-irradiated smoked bacon during storage was predicted using back propagation artificial neural network (BP-ANN) [29]. For the construction of the ANN, the following data were used: physical and chemical indicators, irradiation dose and storage time (input variables). As for the output variables, the total number of colonies and sensory scores were used. The hidden layer consisted of 13 neurons and the transfer functions for the input–hidden layer and the output–hidden layer were ReLu and Sigmoid, respectively. The effects of different neuron counts and numbers of epochs were also considered. The presented results indicate that the proposed model based on physical and chemical indicators, irradiation dose and storage time has a great perspective in the prediction of the quality of smoked bacon [29].

A group of researchers investigated the ultrasound-assisted extraction of phytochemicals from green coconut shells, which was optimized using integrated ANN [30]. The aim was to maximize antioxidant and antimicrobial activity while modeling the extraction process. The Tansig transfer function was used together with the feed-forward back-propagation method, and two input and fifty-six output values were used. Modeling resulted in a few statistically valid networks that were evaluated by means of the low mean square error and the high coefficient of determination. The best ANN for the ultrasound-assisted extraction process was the one with three neurons in the input layer, four neurons in the hidden layer and five neurons in the output layer (3-4-2) [30].

A commercial coffee plantation was used to carry out the experiment and ANN modeling was conducted covering a few morphological variables and a few vegetation indexes collected in the upper, medium and lower thirds of the coffee plant [31]. The formed MLP and the radial basis function (RBF) were applied for the prediction of morphological variables and were evaluated in the terms of accuracy (RMSE) and precision (R²). For plant height, the MLP used three and the RBF used five input variables, while for the plant diameter, both models used three input variables. The presented results indicate that, using MLP, it is possible to estimate coffee tree volume with reasonable accuracy [31].

Visible and near-infrared hyperspectral images were paired with unidimensional deep learning convolutional neural networks (CNNs) for the identification of anthracnose in olives [32]. The experimental data set covered a total of 250 olives without external defects. The authors formed CNN models and selected the ResNet101 architecture as being the most suitable and statistically acceptable. A two-step training process was carried out: in the first case, the weights of the previously trained ResNet101 were locked and the weights of the newly added layers were updated for 20 epochs. In the second case, all the weights were updated. The authors reported that the proposed method was successfully tested and could be used for the control of olive anthracnose [32].

Research covering themes related to the nitrogen content in cucumber plant leaves (Cucumis sativus L.) used hyperspectral imaging data with a neural network [33]. Two artificial intelligence approaches were used: artificial neural networks–particle swarm optimization (ANN-PSO) and CNN. A prediction model was developed for each of the three categories: 30%, 60% and 90% nitrogen excesses. The results showed that regression coefficients for ANN-PSO ranged from 0.9370 to 0.9650 and for CNN the range was from 0.9650 to 0.9850 for the test set. The authors reported that the presented models have an exceptional ability to predict the amount of nitrogen content in cucumber plants using hyperspectral leaves. In this study, the authors also conducted PLSR analysis and the results showed slightly better statistical parameters and performance than ANN-PSO and the CNN [33].

A study regarding the modeling of moisture content evolution in convective drying of quince from Greece was carried out [34]. The first group of trials covered single hidden layer neural networks consisting of 10–100 neurons and different transfer functions together with 500 epochs. The second group of trials comprised ANNs containing two hidden layers with different combinations of artificial neuron number and transfer functions in each layer. The top 15 models, according to their statistical parameters, along with R² values higher than 0.9910 and RMSE values around 0.13, were presented. The authors reported a good agreement between the experimental and estimated values and confirmed that ANNs are able to perform predictions for newly obtained experimental data with a reasonable error [34].

Domestic garlic samples (Allium sativum L.) produced in Spain, Croatia and China were purchased together with one sample from a local producer in Slovenia, and ultrasound-assisted extraction of polyphenols was performed [35]. The experimental method was optimized using an ANN approach. The statistical parameters of the proposed ANN were as follows: root mean square error for training (0.0209), validation (3.6819) and test set (1.8341). The good predictive ability of the ANN was evaluated by the correlation coefficient between the experimentally obtained total phenolic content and the total flavonoid content and values for the training, validation and test sets were 0.9998, 0.9733 and 0.9821, respectively [35].

In research on dragon fruits of the variety Hylocereus undatus that were used for microwave vacuum drying (MVD) experiments, the authors used ANN modeling, aiming to model drying process [36]. The feed-forward back-propagation approach was optimized with GA. The ANN model was developed covering two phases, feed-forward and back propagation, and the efficiency of the proposed ANN was assessed through the mean squared error value as well as the relative deviation values between the experimentally observed and predicted data. The proposed model predicted that the vacuum had the most significant influence on the total phenolic content, microwave power and citric acid concentration. The obtained ANN-GA model predicted data that were in powerful concurrence with a low relative deviation value that ranged from 1.557 to 2.936%. The employed ANN-GA model could be practically used for the modeling of the microwave vacuum drying process for dragon fruit [36].

An efficient crop yield prediction was achieved using the machine learning algorithm ANN [37]. In this study, MLR was also performed and a hybrid MLR-ANN model, together with conventional ANN, was proposed. A feed-forward phase with a back propagation training algorithm was used for the models’ construction. Both models, ANN and MLR-ANN, were compared from the perspective of their statistical validity and predictive power. The computational time for both hybrid MLR-ANN and conventional ANN was calculated. The RMSE and R values for conventional and hybrid model were as follows: 0.098 and 0.051 for error and 0.9200 and 0.9900 for correlation coefficient. The results indicate that the hybrid MLR-ANN has a better accuracy than the conventional ANN for same data set [37].

5. Classification Techniques in Foods

One of the most exploited classification techniques in scientific publications as well as in food science and technology papers is PCA. This method reduces multivariate data sets and/or the dimensionality of data. It reduces and simplifies the original variables to linear combinations known as principal components (PCs). PCs are characterized by loadings and score. Scores are the new coordinates of the projected objects and loadings reflect the direction with respect to the original variables [38]. An overview of food types and classification techniques used in the selected paper is presented in Table 2.

Gas chromatography mass spectrometry (GC-MS), (FTIR) and ultraviolet visible-near infrared spectroscopy (UV-Vis-NIR) data were used to determine quality of katsuobushi based on the number of smoking treatments. Katsuobushi is smoked and dried skipjack tuna, and is a traditional Japanese food additive with a specific flavor and taste [39]. A total of forty-six metabolites were identified and five of them were selected as key compounds. GC-MS, FTIR and NIR spectral data were used for PCA analysis and the results were presented through heatmaps, biplots and VIP scores. Free amino acids and nucleotide-related compounds PCA analysis resulted in the first two PCs describing 92.6% of variance, while katsoubushi samples were distinguished into five groups. After looking at the PCA of the FTIR spectra and the PCA of the NIR spectra, the katsoubushi samples were again divided into five groups based on the number of smoking treatments (zero, three, six, nine and twelve rounds of smoking) which led to compositional changes in the katsoubushi. Regarding the FTIR spectra data, PC1 contributed 89.4%, while PC2 contributed 3.9%. When looking at the NIR spectra, the total variance described by first two PC was 99.1%. In this case, the non-smoked samples were well separated from the smoked groups along the PC1, which indicates the significant difference in the metabolic profiles of non-smoked and smoked samples [39].

The total mercury level distribution in fish and fish products was evaluated and its relationship to fish type, weight, protein and lipids content was observed using a multivariate approach [40]. The influence of lipids and protein content on Hg accumulation in the fish tissues and the impact of Hg concentration and fish consumption on the estimated weekly intake (EWI) were evaluated using PCA. PCA analysis covering Hg distribution in fish samples and moisture ratio resulted in plot with five clusters. PC1 covers 77.72% and PC2 22.28% of the total variance, so the PCA resulted in a model explained by two PCs covering 100%. When the total Hg distribution in fresh fish samples and lipids, protein and moisture content was taken into account, PC1 covers 53.62% and PC2 36.86% of the total variance. When the total Hg distribution in whole fresh fish samples, fish average weight, lipids, protein and moisture content was observed, the total variance covered by the first two PCs was 84.4%. The PCA results revealed that: (a) Hg contamination levels are determined by protein–lipids content; (b) a high lipids content gave lower Hg levels; (c) high Hg levels in fish with a high lipids content corresponded to the polluted environment; (d) EWI was correlated to Hg concentration, except in the case of a low Hg concentration [40].

The fatty acid profiles, pH and color changes of cow milk probiotic yoghurt (CPY) and goat milk probiotic yoghurt (GPY) were studied using gas chromatography (GC) and the chemometric pattern recognition method—PCA [41]. Alterations to the fatty acid profiles of CPY and GPY were presented via a scores plot where PC1 accounted for 88% and PC2 for 4.2% of the total variation. The authors reported that two well-separated clusters can be noticed and that the relative abundance of fatty acids presented in the two clusters was different. Since the CPY and GPY clusters were on the opposite sides of the axis, a negative correlation is indicated for the majority of the fatty acids’ composition [41].

Pan-fried chicken meat patties were studied with respect to the effects of different levels of allspice seed extract (ASE) and perilla frutescens seed extract (PSE) [42]. For the researchers, the impact of ASE and PSE on the formation and migration of heterocyclic amines (HCAs) was of interest. The chicken meat patties were divided into three groups with a control group and the experiment was conducted. PCA was performed in order to reveal the differences in the HCAs profiles. The PCA analysis resulted in a scores plot with PC1 covering 28.48% and PC2 covering 54.65% of variance. The PCA results revealed that most of the single and mixed phenolics displayed strong inhibitory effects on HCAs formation but the mitigating effect of a few mixed phenolics on HCAs formation was weak [42].

PCA was employed in a study investigating commercial fruit beers regarding their polyphenolic and amino acid profiles [43]. The data set comprised twenty-six fruit beers and three control beers without fruit. On the loadings plot, the PCA revealed five different groups for polyphenols, pigments and AAs related to their chemical structure for both phenolic profile and amino acid profile analysis. For the phenolic profile, the eigenvalues for PC 1 and PC2 were 2.50 and 1.17, respectively, while PC1 covered 50.1% and PC2 corresponded to 23.5% of the variation in all data. Beer samples were separated into five groups and their amino acid profiles were subjected to PCA analysis. In this case, the eigenvalues were PC1 = 2.4 and PC2 = 1.5 while PC1 accounted for 48.0% and PC2 30.3% of the total data variability [43].

Hawthorn (Crataegus azarolus L.) fruit from Turkey was the subject of a study that investigated the effect of maturity stage on fruit quality characteristics, sensory attributes and volatile compounds [44]. Solid-phase micro-extraction (SPME) and gas chromatography–mass spectrometry (GC–MS) were conducted and experimental data sets were collected. For the PCA analysis, the dominant volatile organic compounds in fruit from different maturity stages were used as variables. According to the eigenvalues, PC1, PC2 and PC3 were chosen covering 78% of the total variance. A scores plot (PC1 50.5% and PC2 17.5%) resulted in three well-separated clusters that grouped the hawthorn fruit according to their maturity stage: (a) immature, (b) mature and (c) over-mature fruit. The authors reported that mature and over-mature fruit were more likable to panelists, while having the highest level of esters responsible for flavor [44].

The fruit and leaf diversity of mangoes (Mangifera indica L.) was investigated [45]. Fifty-eight mango genotypes from India including twenty selections, seventeen hybrids and twenty-one landraces (local genotypes) were taken and analyzed. The experimental analysis covered a total of 70 pheno-biochemical characteristics based on leaf morphology and fruits. From the PCA analysis, it can be concluded that several pomological traits of economic significance showed extreme variability. On the presented PCA graphs, the genotypes were clustered depending on their biochemical characteristics and phenotypic similarity [45].

Since pacu fish are mainly grown in Argentina as an important source of food and revenue, their growth withing rice fields and the effect of this on the fish metabolome were researched [46]. Farmed (bred by the integrated rice and fish farming system and obtained in local market) and control (raised in a tank) fish muscle samples were investigated using two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC-TOFMS). The total ion current and diagnostic ions for sugars data were used as the input for the PCA analysis which created an overview regarding the separation of farmed and control pacu fish samples. PCA resulted in a scores plot presenting PC1 (50.95% of total variance) and PC2 (26.82% of total variance) while showing two clusters: one with the control and other with the farmed samples. The scores plot obtained revealed that control and farmed samples are best separated along PC2. The authors concluded that PCA gave a meaningful overview of the effects of farming pacu fish within rice fields [46].

Intact beef, venison and lamb meat types were investigated using Raman spectroscopy in order to establish fast and reliable techniques for intact meat discrimination [47]. The meat samples were from the New Zealand red meat sector and a total of 90 samples were used: beef (Bos taurus), lamb (Ovis aries) and venison (Cervus elaphus scoticus, hippelaphus and pannonensis). PCA analysis resulted in a scores plot that had PC1 with 33% and PC2 of 26% of the total variability in the meat data. Regardless of the fact that minor overlaps occurred, the PCA scores plot showed a good separation of the beef, venison and lamb meat samples. The authors outlined that Raman spectroscopy could be paired with a chemometrics approach which would result in fast and reliable techniques for intact meat differentiation [47].

Quinoas were analyzed using gas chromatography–ion mobility spectrometry (GCIMS) and PCA was applied to obtain characteristic volatile profiles [48]. Used quinoas were white, red and black quinoa varieties from China and total of 28 characteristic volatile compounds were found and quantified. The total contribution of the first two PCs was 85.31% (PC1 68.73% and PC2 16.58%), which was adequate to explain the similarities between samples. The scores plot resulted in three well-separated clusters, each of them containing one quinoa variety: white, red and black. Since different colored samples were placed in different quadrants, far away from each other, it can be concluded that there are significant differences in volatile profiles in different quinoa varieties. The authors concluded that the combination of visual plots provided by GC-IMS and PCA results were of high enough quality for the characterization of aroma profiles of different quinoa varieties [48].

A research group investigated the volatile flavor compounds found in different production stages of fermented soybean whey tofu from China using headspace-gas chromatography–ion mobility spectrometry (HS-GC-IMS) in the combination with PCA [49]. In the samples across all production stages, 24 representative flavor compounds were experimentally obtained. The PCA scores plot revealed that samples from the six stages of fermented soybean whey tofu production occurred in the independent spaces and clear-cut differences between the groups were noticeable. The results of this study indicate that the flavor fingerprints of the samples from different stages of fermented soybean whey tofu production can be well assembled using HS-GC-IMS and PCA through the detection of the volatile compounds [49].

A common adulteration in food industry is that of extra virgin olive oil since it has greater value than other edible oils [50]. Scientists developed a rapid method for the detection of extra virgin olive oil adulteration by means of ultra-high-performance liquid chromatography with charged aerosol detection (UHPLC-CAD) profiling of triacylglycerols coupled with the chemometric pattern technique—PCA. A single-variety of 25 fresh extra virgin olive oil samples from California together with samples that included eleven grapeseed oils, three soybean oils, seven canola oils, four high-oleic safflower oils and five high-oleic sunflower oils were purchased and analyzed. Experimentally obtained data from the tryglycerols analysis were used as input data for the PCA analysis. The tryglycerols profiles were determined for olive oil samples and for the five common olive oil adulterants. Several combined PCA scores were obtained using different input data combinations. All graphs indicate that combining tryglycerols profiles with PCA can sucessfully differentiate extra virgin olive from high-oleic sunflower oil at adulteration levels greater than 10%. The authors reported that UHPLC-CAD coupled with PCA needs minimal sample preparation and carries out fast analysis for the rapid determination of extra virgin olive oil authenticity [50].

Table 2. Food and chemometric classification techniques overview.

Food Type	Modeling Type	Reference
katsuobushi	PCA	[39]
fish and fish products	PCA	[40]
cow milk and goat milk probiotic yoghurt	PCA	[41]
chicken meat patty	PCA	[42]
fruit beers	PCA	[43]
hawthorn fruit	PCA	[44]
mangoes	PCA	[45]
pacu fish	PCA	[46]
beef, venison and lamb meat	PCA	[47]
three colored quinoas	PCA	[48]
fermented soybean whey tofu	PCA	[49]
extra virgin olive oil	PCA	[50]
chestnut honey	HCA	[51]
rice	HCA	[52]
basil	HCA	[53]
pine nut	HCA	[54]
spirulina	HCA	[55]
sweet cherry	HCA	[56]

Together with PCA, the second most widely used classification technique is HCA. HCA divides a group of objects into classes and sorts similar objects into the same class (cluster). In this type of analysis the objects that are close together in the variable space are being searched. As a result of HCA analysis, a tree diagram (dendrogram) occurs, where the horizontal axis explains the dissimilarity between the clusters throughout their distance. An overview of the food types and classification techniques in the selected papers is presented in Table 2.

Chestnut honey samples from Turkey were distinguished based on their phenolic compositions and biological activities using HCA [51]. A total of 16 phenolic compounds and organic acids were detected by HPLC-DAD. The antioxidant activity was evaluated using ABTS^•+, β-carotene-linoleic acid, CUPRAC, DPPH^• and metal chelating assays, while antimicrobial activity was tested against Gram-positive and Gram-negative bacteria and Candida species. Additionally, anti-inflammatory activity was evaluated against COX-1 and COX-2, while enzyme-inhibitory activity was assessed on AChE, BChE, urease, and tyrosinase. The collected data were used for the classification of 41 chestnut honey samples from different locations in Turkey. A dendrogram based on the single linkage and Euclidean distance resulted in two well-separated clusters: chestnut honeys produced in Turkey and Bursa. The habitats of samples labeled BO, BI, BK1 and BK2 are adjacent to inland lakes (İznik and Ulubat) and their habitats are similar to each other [51].

The authors of [52] researched the discrimination of rice varieties using colorimetric sensor arrays together with gas chromatography techniques [52]. A total of nine rice varieties from Pakistan were analyzed and experimentally obtained data were used for HCA analysis in order to reveal and visualize the differences between different rice samples from various geographical origins. For the similarity exploration, the Euclidean distance method was used. Covering colorimetric analysis data, the cut distance was 8.4 and rice cultivars were clustered into six groups, while group three was out of clusters. Rice samples from the same cultivar and geographical origin were placed into the same or adjacent clusters. The presented results indicate that the HCA method can be used to correctly differentiate rice varieties from different geographical origins. Data obtained in GC-MS analysis were also evaluated using HCA which resulted in the rice cultivars from different geographical origins being separated into three clusters. This clustering showed that variability in the concentration of aroma compounds had a leading role when discriminating rice varieties [52].

The stress caused by cadmium, lead and aluminum exposure in basil (Ocimum basilicum L.) cultivated in Brazil was assessed using multivariate analysis approach [53]. Caffeic and rosmarinic acid were determined by high performance liquid chromatography analysis with a diode detector (HPLC-DAD) while total phenolics and total flavonoids were determined by spectrophotometry. Plants were exposed to four different concentration levels of metals: cadmium (0.2, 0.6, 1.2, and 1.8 mmol L⁻¹), lead and aluminum (0.04, 0.08, 0.12, and 0.16 mmol L⁻¹). Ward’s method and Euclidean distances were used in the HCA analysis. This grouping reveals that there were different influences of the studied metals on the secondary metabolism of O. basilicum [53].

Another group of researchers investigated the characterization of the chemicals in pine nuts from Brazil using exploratory data analysis [54]. The mineral composition (Ca, Cu, Fe, K, Mg, Mn, P and Zn), centesimal composition (moisture, ash, lipids, protein and carbohydrate) and amount of lead (Pb) were determined using inductively coupled plasma optical emission spectrometry (ICP OES) and graphite furnace atomic absorption spectrometry (GF AAS). The results gained with HCA confirmed the results from the PCA analysis. The dendrogram was generated using the Ward method and Euclidean distances and resulted in two distinct groups according to the mineral composition with a similarity index of 15 [54].

Spirulina (Spirulina platensis) and its commercial products are very interesting to the food industry since spirulina is rich in protein content and has other nutritional values [55]. Two-trace two-dimensional (2T2D) correlation infrared spectral analysis was conducted and a chemometric analysis of the experimentally obtained data was carried out. The S. platensis strain from India was grown and taken as a control sample while commercial samples of Spirulina food products and food supplements were purchased. HCA analysis was performed based on Ward’s method and Euclidean distances. The dendrogram resulted in two clusters with a subcluster in which it can be seen that the control samples are distinguished from the others. The authors concluded that the results of the HCA analysis are in accordance with the results obtained by PCA analysis [55].

The researchers wanted to identify which sweet cherry (Prunus avium L.) cultivars in Italy are mostly diffused [56]. They characterized 35 sweet cherry cultivars and one sour cherry cultivar through the analysis of different pomological and nutraceutical traits. In addition, the authors wanted to identify cultivars that had antioxidant activity and total anthocyanins content closest to those values presented for largely diffused cultivar in Italy—Ferrovia. Two HCA analysis were conducted with paired group algorithm taking into account the following: (a) titratable acidity, soluble solid content, soluble solid content, titratable acidity ratio, and pH; and (b) total phenolic content, antioxidant activity and total anthocyanins content. The first clustering resulted in eight groups, while the second one resulted in five groups separated in the dendrogram. In the first dendrogram the sour cherry cultivar was out of the clusters, while in the second, one sweet cherry sample was out of the clusters. The authors concluded that clustering highlighted a wide diversity in sweet cherry genotypes in Italy [56].

6. Non-Parametric Methods in Foods

The SRD method is a non-parametric method used when objects are being ranked based on the defined reference ranking (ideal ranking or golden standard) and it was introduced by Héberger and Kollár-Hunek [57,58]. Using the defined reference ranking’s mean, median, minimum and maximum, a known standard can be found. In this analysis, results are shown in the form of a graph that represents the distribution of the samples in relation to the chosen reference ranking. The closer the value of SRD is to zero, the better the variable is. The validation is carried out through the comparison of ranks by random numbers (CRRN) procedure and by the seven-fold cross-validation procedure [57,58]. Table 3 shows studies that recently applied the SRD method.

The PLS-DA model was upgraded with the SRD algorithm and model for tea grade identification using electronic tongue data [59]. Tea grade identification plays a crucial role in tea pricing and sales. The tea grades were distinguished and identified using PCA and the PLS-DA-SRD model. The performances of the established PLS-DA and PLS-DA-SRD models were compared and significant improvement regarding accuracy, and sensitivity was proven when SRD was coupled with PLS-DA. The authors concluded that the PLS-DA-SRD approach successfully identified tea sample grade [59].

The authors of [60] employed ranking and multicriteria decision making in the optimization of raspberry convective drying processes [60]. A comparative experiment for the investigation of the suitable process parameters for convective drying that may be considered as the alternative to freeze-drying was conducted. SRD was applied to reveal the differences and similarities between the applied drying methods. Multiple validation steps including different resampling methods and leave-multiple-out cross validations were used. The results of conducted SRD analysis indicate that convective drying of fresh raspberries turned out be more similar to freeze-dried raspberries than convective drying of frozen ones [60].

Since there is an ongoing trend for the human consumption of insects, the authors of [61] conducted research that aimed to propose which insect species is the most suitable for human consumption [61]. Previously published results were used and a comprehensive picture of the nutritional profile of insects using the sum of ranking differences was presented through cases studies. The case studies dealt with the proximate nutritional profile of the insects and traditional protein sources (beef, pork, chicken, egg, salmon and milk) in terms of mineral content, amino acid profiles, vitamin content and origin. The main difficulties that the authors faced included the original data’s quality, missing data, as well as the fact that studies from different parts of the world gave significantly different nutrient results for the same insect species. Their results suggest that the superiority of insects as a protein source cannot be stated in every case but the general view in favor of insects is promising [61].

SRD was used in a study that dealt with beer microfiltration with a static turbulence promoter [62]. The main challenges during beer microfiltration are fouling and quality maintenance. The experiments were ranked using SRD based on the analytical properties, hydrodynamic parameters and separation characteristics parameters of ten different membrane filtration experiments. SRD was used to determine the best performing membrane filtration method based on the reference ranking determined as the min value [62].

The combination of the leave-one-out (LOO) cross-validation methodology of SRD values and significant differences by post hoc Wilcoxon matched pairs test was used for the evaluation of various tomato landraces and one commercial variety [63]. The study aimed to search for a combination of methodologies that can be validated as being suitable for this type of study and these samples. The authors collected 11 varieties of red and orange tomato samples and characterized them by phytonutrients composition. One sample was a commercial variety and the rest were tomato landraces. The SRD analysis resulted in the formation of three groups: (a) the two samples closest to the reference landrace that had the highest phytonutrients content values; (b) seven samples following the first group, and (c) comprising two samples (one of them was the commercial variety). The authors reported that the investigated commercial variety had a lower phytonutrient content than those of landraces [63].

The SRD method was applied to evaluate the performance of eight different Ocimum basilicum L. varieties’ gene bank accessions [64]. Using the varieties’ characteristics, the gene bank accessions were compared with the SRD method. LOO cross-validation was performed to characterize the uncertainty of the SRD values and the Wilcoxon matched pairs test and Sign test were used for the pairwise comparison defining. The results indicated that one variety (M. Grünes) was evaluated as the best performing of the selected gene bank-stored accessions. The authors pointed out that basil species selection based on multicriteria and correct statistical tests were being published for the first time [64].

One of the most underutilized chemometrics method is GPCM. This method is mostly used in studies regarding biologically active compounds and in the analytical chemistry domain [65]. The method was first introduced by Rajkó and Héberger as the pair correlation method (PCM) which can discriminate between two variables [66]. Then, the PCM was generalized (GPCM), and it can be performed for up to several hundred features [67,68]. Very few studies related to food science and technology topics have included GPCM although this method has a great potential. In Table 3, some research papers that employ GPCM are presented.

Researchers produced and evaluated buckwheat-pasta enriched with silkworm powder and used just-about-right (JAR) data evaluation [69]. A part of their study was GPCM analysis that was conducted on the basis of consumer sensory analysis results (overall liking, color, odor, texture, graininess and flavor attributes). The poppy seed-flavored white chocolates’ sensory acceptance was also evaluated using GPCM [70]. GPCM analysis takes into account parameters regarding color, texture, taste and overall liking. JAR attributes (color, texture, meltiness, particle size, global taste intensity, poppy seed flavor and chocolate flavor) were ranked using conditional probability ordering and conditional Fisher’s exact test. Flavored mineral water samples with mango–passion fruit aroma were examined regarding different JAR attributes (color intensity, odor intensity, fruit flavor, carbonation, sweet taste, sour taste, bitter taste and aftertaste intensity) [71]. All GPCM methods (simple, difference and significance ordering) and tests (McNemar’s, Chi-square, conditional Fisher’s and William’s t-test) were applied in order to rank JAR attributes.

Table 3. Food and non-parametric methods overview.

Food Type	Modeling Type	Reference
tea	SRD	[59]
raspberry	SRD	[60]
insects	SRD	[61]
bear	SRD	[62]
tomato	SRD	[63]
basil	SRD	[64]
buckwheat-pasta enriched with silkworm powder	GPCM	[69]
poppy seed-flavored white chocolates	GPCM	[70]
flavored mineral water samples with mango-passion fruit aroma	GPCM	[71]

7. Concluding Remarks

As Svante Wold concluded, the future of chemometric is bright [2]. In this paper a brief and systematic review regarding ongoing multivariate chemometrics approach in bioactive compounds and functional properties of foods is summarized. In the current literature that is available there is a wide spectrum of different regression, classification and non-parametric chemometric methods used for experimentally observed data presentation and interpretation. The ongoing trend in recent research indicates that chemometrics will be progressively used in the domain of food science and technology since its benefits are repeatedly proven and since this research area is highly competitive and fast growing.

Author Contributions

Conceptualization, M.K.B. and S.K.; methodology, M.K.B.; investigation, M.K.B. and S.K.; writing—original draft preparation, M.K.B., S.K. and S.P.-K.; writing—review and editing, M.K.B., S.K. and S.P.-K.; visualization, M.K.B. and S.K.; supervision, S.P.-K.; project administration, S.P.-K. All authors have read and agreed to the published version of the manuscript.

Funding

The present research is financed in the framework of the project of Provincial Secretariat for Higher Education and Scientific Research of AP Vojvodina (Project: Molecular engineering and chemometric tools: Towards safer and greener future (No. 142-451-3457/2023-01/01) and the project of the Ministry of Science, Technological Development and Innovation (Project No. 451-03-65/2024-03/200134).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wold, S. Spline functions, a new tool in data-analysis. Sven. Kem. Tidskr. 1972, 3, 1–11. [Google Scholar]
Wold, S. Chemometrics; what do we mean with it, and what do we want from it? Chemom. Intell. Lab. Syst. 1995, 30, 109–115. [Google Scholar] [CrossRef]
Li, C.-Q.; Xiao, N.; Wen, Y.; He, S.-H.; Xu, Y.-D.; Lin, Y.-W.; Li, H.-D.; Xu, Q.-S. Collaboration patterns and network in chemometrics. Chemom. Intell. Lab. Syst. 2019, 191, 21–29. [Google Scholar] [CrossRef]
Aleixandre-Tudo, J.L.; Castello-Cogollos, L.; Aleixandre, J.L.; Aleixandre-Benavent, R. Chemometrics in food science and technology: A bibliometric study. Chemom. Intell. Lab. Syst. 2022, 222, 104514. [Google Scholar] [CrossRef]
Marquardt, D.W.; Snee, R.D. Ridge regression in practice. Am. Stat. 1975, 29, 3–20. [Google Scholar] [CrossRef]
Miller, J.N.; Miller, J.C. Statistics and Chemometrics for Analytical Chemistry, 6th ed.; Pearson: Harlow, UK, 2010. [Google Scholar]
Yang, B.; Jiang, J.; Zhang, H.; Han, Z.; Lei, X.; Chen, X.; Xiao, Y.; Ndombi, S.N.; Zhu, X.; Fang, W. Tea quality estimation based on multi-source information from leaf and soil using machine learning algorithm. Food Chem. X 2023, 20, 100975. [Google Scholar] [CrossRef] [PubMed]
Crespo, A.; Jiménez, A.; Ruiz-Moyano, S.; Merchán, A.V.; Galván, A.I.; Benito, M.J.; Martín, A. Low-frequency ultrasound as a tool for quality control of soft-bodied raw ewe’s milk cheeses. Food Control 2022, 131, 108405. [Google Scholar] [CrossRef]
Ruggiero, L.; Amalfitano, C.; Di Vaio, C.; Adamo, P. Use of near-infrared spectroscopy combined with chemometrics for authentication and traceability of intact lemon fruits. Food Chem. 2022, 375, 131822. [Google Scholar] [CrossRef]
Qiao, M.; Xia, G.; Cui, T.; Xu, Y.; Gao, X.; Su, Y.; Li, Y.; Fan, H. Effect of moisture, protein, starch, soluble sugar contents and microstructure on mechanical properties of maize kernels. Food Chem. 2022, 379, 132147. [Google Scholar] [CrossRef]
Han, G.; Dai, L.; Sun, Y.; Li, C.; Ruan, S.; Li, J.; Xu, Y. Determination of the age of dry red wine by multivariate techniques using color parameters and pigments. Food Control 2021, 129, 108253. [Google Scholar] [CrossRef]
de Lima, A.B.S.; Batista, A.S.; de Jesus, J.C.; de Jesus Silva, J.; de Araújo, A.C.M.; Santos, L.S. Fast quantitative detection of black pepper and cumin adulterations by near-infrared spectroscopy and multivariate modeling. Food Control 2020, 107, 106802. [Google Scholar] [CrossRef]
Rasool Khodabakhshian, R.; Seyedalibeyk Lavasani, H.; Weller, P. A methodological approach to preprocessing FTIR spectra of adulterated sesame oil. Food Chem. 2023, 419, 136055. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Bi, P.; Sun, N.; Gao, Z.; Chen, X.; Guo, J. Characterization of different non-Saccharomyces yeasts via mono-fermentation to produce polyphenol-enriched and fragrant kiwi wine. Food Microbiol. 2022, 103, 103867. [Google Scholar] [CrossRef] [PubMed]
Hitchman, S.; Loeffen, M.P.F.; Reis, M.M.; Craigie, C.R. Robustness of hyperspectral imaging and PLSR model predictions of intramuscular fat in lamb M. longissimus lumborum across several flocks and years. Meat Sci. 2021, 179, 108492. [Google Scholar] [CrossRef] [PubMed]
Htet, T.T.M.; Cruz, J.; Khongkaew, P.; Suwanvecho, C.; Suntornsuk, L.; Nuchtavorn, N.; Limwikrant, W.; Phechkrajang, C. PLS-regression-model-assisted raman spectroscopy for vegetable oil classification and non-destructive analysis of alpha-tocopherol contents of vegetable oils. J. Food Compos. Anal. 2021, 103, 104119. [Google Scholar] [CrossRef]
Wang, C.; Sun, Y.; Zhou, Y.; Cui, Y.; Yao, W.; Yu, H.; Guo, Y.; Xie, Y. Dynamic monitoring oxidation process of nut oils through Raman technology combined with PLSR and RF-PLSR model. LWT—Food Sci. Technol. 2021, 146, 111290. [Google Scholar] [CrossRef]
Wistaff, E.A.; Beller, S.; Schmid, A.; Neville, J.J.; Nietner, T. Chemometric analysis of amino acid profiles for detection of fruit juice adulterations—Application to verify authenticity of blood orange juice. Food Chem. 2021, 343, 128452. [Google Scholar] [CrossRef]
Zhou, Y.; Cui, Y.; Wang, C.; Yang, F.; Yao, W.; Yu, H.; Guo, Y.; Xie, Y. Rapid and accurate monitoring and modeling analysis of eight kinds of nut oils during oil oxidation process based on Fourier transform infrared spectroscopy. Food Control 2021, 130, 108294. [Google Scholar] [CrossRef]
Hara, R.; Ishigaki, M.; Ozaki, Y.; Ahamed, T.; Noguchi, R.; Miyamoto, A.; Genkawa, T. Effect of Raman exposure time on the quantitative and discriminant analyses of carotenoid concentrations in intact tomatoes. Food Control 2021, 360, 129896. [Google Scholar] [CrossRef]
Hayati, R.; Munawar, A.A.; Lukitaningsih, E.; Earlia, N.; Karma, T.; Idroes, R. Combination of PCA with LDA and SVM classifiers: A model for determining the geographical origin of coconut in the coastal plantation, Aceh Province, Indonesia. Case Stud. Chem. Environ. Eng. 2024, 9, 100552. [Google Scholar] [CrossRef]
Donato, F.D.; Squeo, F.; Biancolillo, A.; Rossi, L.; D’Archivio, A.A. Characterization of high value Italian chickpeas (Cicer arietinum L.) by means of ICP-OES multi-elemental analysis coupled with chemometrics. Food Control 2022, 131, 108451. [Google Scholar] [CrossRef]
Sampaio, S.L.; Barreira, J.C.M.; Fernandes, Â.; Petropoulos, S.A.; Alexopoulos, A.; Sntos-Buelga, C.; Ferreira, I.C.F.R.; Barros, L. Potato biodiversity: A linear discriminant analysis on the nutritional and physicochemical composition of fifty genotypes. Food Chem. 2021, 345, 128853. [Google Scholar] [CrossRef] [PubMed]
Özbek, F.S.; Fidan, H. Estimation of pesticide usage in the agricultural sector in Turkey using artificial neural network (ANN). J. Anim. Plant Sci. 2009, 4, 373–378. [Google Scholar]
Jayalakshmi, T.; Santhakumaran, A. Statistical normalization and back propagation for classification. Int. J. Comput. Theory Eng. 2011, 3, 89–93. [Google Scholar] [CrossRef]
Wang, D.; Zhang, M.; Law, C.L.; Zhang, L. Natural deep eutectic solvents for the extraction of lentinan from shiitake mushroom: COSMO-RS screening and ANN-GA optimizing conditions. Food Chem. 2024, 430, 136990. [Google Scholar] [CrossRef] [PubMed]
Deng, F.; Lu, H.; Yuan, Y.; Chen, H.; Li, Q.; Wang, L.; Tao, Y.; Zhou, W.; Cheng, H.; Chen, Y.; et al. Accurate prediction of the eating and cooking quality of rice using artificial neural networks and the texture properties of cooked rice. Food Chem. 2023, 407, 135176. [Google Scholar] [CrossRef] [PubMed]
Cosme, F.; Milheiro, J.; Pires, J.; Guerra-Gomes, F.I.; Filipe-Ribeiro, L.; Nunes, F.M. Authentication of Douro DO monovarietal red wines based on anthocyanin profile: Comparison of partial least squares—Discriminant analysis, decision trees and artificial neural networks. Food Control 2021, 125, 107979. [Google Scholar] [CrossRef]
Huang, X.; You, Y.; Zeng, X.; Liu, Q.; Dong, H.; Qian, M.; Xiao, S.; Yu, L.; Hu, X. Back propagation artificial neural network (BP-ANN) for prediction of the quality of gamma-irradiated smoked bacon. Food Chem. 2024, 437, 137806. [Google Scholar] [CrossRef]
Singh, P.; Pandey, V.K.; Chakraborty, S.; Dash, K.K.; Singh, R.; Mukarram, A.; Béla, K. Ultrasound-assisted extraction of phytochemicals from green coconut shell: Optimization by integrated artificial neural network and particle swarm technique. Heliyon 2023, 9, e22438. [Google Scholar] [CrossRef]
de Oliveira, M.F.; dos Santos, A.F.; Kazama, E.H.; de Souza Rolim, G.; da Silva, R.P. Determination of application volume for coffee plantations using artificial neural networks and remote sensing. Comput. Electron. Agric. 2021, 184, 106096. [Google Scholar] [CrossRef]
Fazari, A.; Pellicer-Valero, O.J.; Gómez-Sanchıs, J.; Bernardi, B.; Cubero, S.; Benalia, S.; Zimbalatti, G.; Blasco, J. Application of deep convolutional neural networks for the detection of anthracnose in olives using VIS/NIR hyperspectral images. Comput. Electron. Agric. 2021, 187, 106252. [Google Scholar] [CrossRef]
Sabzi, S.; Pourdarbani, R.; Rohban, M.H.; García-Mateos, G.; Arribas, J.I. Estimation of nitrogen content in cucumber plant (Cucumis sativus L.) leaves using hyperspectral imaging data with neural network and partial least squares regressions. Chemom. Intell. Lab. Syst. 2021, 217, 104404. [Google Scholar] [CrossRef]
Chasiotis, V.K.; Tzempelikos, D.A.; Filios, A.; Moustris, K.P. Artificial neural network modelling of moisture content evolution for convective drying of cylindrical quince slices. Comput. Electron. Agric. 2020, 172, 105074. [Google Scholar] [CrossRef]
Ciric, A.; Krajnc, B.; Heath, D.; Ogrinc, N. Response surface methodology and artificial neural network approach for the optimization of ultrasound-assisted extraction of polyphenols from garlic. Food Chem. Toxicol. 2020, 135, 110976. [Google Scholar] [CrossRef]
Raj, G.V.S.B.; Dash, K.K. Microwave vacuum drying of dragon fruit slice: Artificial neural network modelling, genetic algorithm optimization, and kinetics study. Comput. Electron. Agric. 2020, 178, 105814. [Google Scholar]
Gopal, R.S.M.; Bhargavi, R. A novel approach for efficient crop yield prediction. Comput. Electron. Agric. 2019, 165, 104968. [Google Scholar] [CrossRef]
Bereton, R.G. Chemometrics, Data Analysis for the Laboratory and Chemical Plant; Wiley: Chichester, UK, 2003. [Google Scholar]
Park, M.; Yu, J.Y.; Ko, J.A.; Park, H.J. V-Vis-NIR and FTIR spectroscopy coupled with chemometrics for quality prediction of katsuobushi based on the number of smoking treatments. Food Chem. 2024, 442, 138604. [Google Scholar] [CrossRef]
Al-Sulaiti, M.M.; Soubra, L.; Ramadan, G.A.; Ahmed, A.Q.S.; Al-Ghouti, M.A. Total Hg levels distribution in fish and fish products and their relationships with fish types, weights, and protein and lipid contents: A multivariate analysis. Food Chem. 2023, 421, 136163. [Google Scholar] [CrossRef]
Sharma, H.; Ramanathan, R. Differences and correlation among various fatty acids of cow milk and goat milk probiotic yoghurt: Gas chromatography, PCA and network based analysis. Food Chem. Adv. 2023, 2, 100430. [Google Scholar] [CrossRef]
Khan, I.A.; Luo, J.; Shi, H.; Zou, Y.; Khan, A.; Zongshuai, Z.; Xu, W.; Wang, D.; Huang, M. Mitigation of heterocyclic amines by phenolic compounds in allspice and perilla frutescens seed extract: The correlation between antioxidant capacities and mitigating activities. Food Chem. 2022, 368, 130845. [Google Scholar] [CrossRef]
Baigts-Allende, D.K.; Pérez-Alva, A.; Ramírez-Rodrigues, M.A.; Palacios, A.; Milena, M.; Ramírez-Rodrigues, M.M. A comparative study of polyphenolic and amino acid profiles of commercial fruit beers. J. Food Compos. Anal. 2021, 100, 103921. [Google Scholar] [CrossRef]
Dursun, A.; Çalışlan, O.; Güler, Z.; Bayazit, S.; Türkmen, D.; Gündüz, K. Effect of harvest maturity on volatile compounds profiling and eating quality of hawthorn (Crataegus azarolus L.) fruit. Sci. Hortic. 2021, 288, 110398. [Google Scholar] [CrossRef]
Jena, R.C.; Agarwal, K.; Chand, P.K. Fruit and leaf diversity of selected Indian mangoes (Mangifera indica L.). Sci. Hortic. 2021, 282, 109941. [Google Scholar] [CrossRef]
Monzón, C.; Schöneich, S.; Synovec, R.E. Non-targeted discovery of class-distinguishing metabolites in Argentinian pacu fish by comprehensive two-dimensional gas chromatography with principal component analysis. Microchem. J. 2021, 164, 106004. [Google Scholar] [CrossRef]
Robert, C.; Fraser-Miller, S.J.; Jessep, W.T.; Bain, W.E.; Hicks, T.M.; Ward, J.F.; Craigie, C.R.; Loeffen, M.; Gordon, K.C. Rapid discrimination of intact beef, venison and lamb meat using Raman spectroscopy. Food Chem. 2021, 343, 128441. [Google Scholar] [CrossRef] [PubMed]
Song, J.; Shao, Y.; Yan, Y.; Li, X.; Peng, J.; Guo, L. Characterization of volatile profiles of three colored quinoas based on GC-IMS and PCA. LWT—Food Sci. Technol. 2021, 146, 111292. [Google Scholar] [CrossRef]
Yang, Y.; Wang, B.; Fu, X.; Shi, Y.; Chen, F.; Guan, H.; Liu, L.; Zhang, C.; Zhu, P.; Liu, Y.; et al. HS-GC-IMS with PCA to analyze volatile flavor compounds across different production stages of fermented soybean whey tofu. Food Chem. 2021, 346, 128880. [Google Scholar] [CrossRef]
Green, H.S.; Li, X.; De Pra, M.; Lovejoy, K.S.; Steiner, F.; Acworth, I.N.; Wang, S.C. A rapid method for the detection of extra virgin olive oil adulteration using UHPLC-CAD profiling of triacylglycerols and PCA. Food Control 2020, 107, 106773. [Google Scholar] [CrossRef]
Taş-Küçükaydın, M.; Tel-Çayan, G.; Çayan, F.; Küçükaydın, S.; Çiftçi, B.H.; Ceylan, Ö.; Duru, M.E. Chemometric classifcation of chestnut honeys from different regions in Turkey based on their phenolic compositions and biological activities. Food Chem. 2023, 415, 135727. [Google Scholar] [CrossRef]
Arslan, M.; Zareef, M.; Tahir, H.E.; Guo, Z.; Rakha, A.; Xuetao, H.; Shi, J.; Zhihua, L.; Xiaobo, Z.; Kjan, M.R. Discrimination of rice varieties using smartphone-based colorimetric sensor arrays and gas chromatography techniques. Food Chem. 2022, 368, 130783. [Google Scholar] [CrossRef]
do Prado, N.B.; de Abreu, C.B.; Pinho, C.S.; Junior, M.M.N.; Silva, M.D.; Espino, M.; Silva, M.F.; Dias, F.S. Application of multivariate analysis to assess stress by Cd, Pb and Al in basil (Ocimum basilicum L.) using caffeic acid, rosmarinic acid, total phenolics, total flavonoids and total dry mass in response. Food Chem. 2022, 367, 130682. [Google Scholar] [CrossRef]
Silva, E.F.R.; da Silva Santos, B.R.; Minho, L.A.C.; Brandão, G.C.; de Jesus Silva, M.; Silva, M.V.L.; dos Santos, W.N.L.; dos Santos, A.M.P. Characterization of the chemical composition (mineral, lead and centesimal) in pine nut (Araucaria angustifolia (Bertol.) Kuntze) using exploratory data analysis. Food Chem. 2022, 369, 130672. [Google Scholar] [CrossRef]
Kavitha, E.; Stephen, L.D.; Brishti, F.H.; Karthikeyan, S. Two-trace two-dimensional (2T2D) correlation infrared spectral analysis of Spirulina platensis and its commercial food products coupled with chemometric analysis. J. Mol. Struct. 2021, 1244, 130964. [Google Scholar] [CrossRef]
Ceccarelli, D.; Antonucci, F.; Costa, C.; Talento, C.; Ciccoritti, R. An artificial class modelling approach to identify the most largely diffused cultivars of sweet cherry (Prunus avium L.) in Italy. Food Chem. 2020, 333, 127515. [Google Scholar] [CrossRef] [PubMed]
Héberger, K. Sum of ranking differences compares methods or models fairly. Trends Anal. Chem. 2010, 29, 101–109. [Google Scholar] [CrossRef]
Héberger, K.; Kollár-Hunek, K. Sum of ranking differences for method discrimination and its validation: Comparison of ranks with random numbers. J. Chemom. 2011, 25, 151–158. [Google Scholar] [CrossRef]
Chen, X.; Xu, Y.; Meng, L.; Chen, X.; Yuan, L.; Cai, Q.; Shi, W.; Huang, G. Non-parametric partial least squares—Discriminant analysis model based on sum of ranking difference algorithm for tea grade identification using electronic tongue data. Sens. Actuators B Chem. 2020, 311, 127924. [Google Scholar] [CrossRef]
Stamenković, Z.; Radojčin, M.; Pavkov, I.; Bikić, S.; Ponjičan, O.; Bugarin, R.; Kovács, S.; Gere, A. Ranking and multicriteria decision making in optimization of raspberry convective drying processes. J. Chemom. 2020, 34, e3224. [Google Scholar] [CrossRef]
Gere, A.; Radványi, D.; Héberger, K. Which insect species can best be proposed for human consumption? Innov. Food Sci. Emerg. Technol. 2019, 52, 358–367. [Google Scholar] [CrossRef]
Varga, Á.; Gáspár, I.; Juhász, R.; Ladányi, M.; Hegyes-Vecseri, B.; Kókai, Z.; Márki, E. Beer microfiltration with static turbulence promoter: Sum of ranking differences comparison. J. Food Process Eng. 2018, 42, e12941. [Google Scholar] [CrossRef]
Csambalik, L.; Divéky-Ertsey, A.; Pusztai, P.; Boros, F.; Orbán, C.; Kovács, S.; Gere, A.; Sipos, L. Multi-perspective evaluation of phytonutrients—Case study on tomato landraces for fresh consumption. J. Funct. Foods 2017, 33, 211–216. [Google Scholar] [CrossRef]
Sipos, L.; Bernhardt, B.; Gere, A.; Komáromi, B.; Orbán, C.; Bernáth, J.; Szabó, K. Multicriteria optimization to evaluate the performance of Ocimum basilicum L. varieties. Ind. Crops Prod. 2016, 94, 514–519. [Google Scholar] [CrossRef]
Andrić, F.; Héberger, K. How to compare separation selectivity of high-performance liquid chromatographic columns properly? J. Chromatogr. A 2017, 1488, 45–56. [Google Scholar] [CrossRef] [PubMed]
Rajkó, R.; Héberger, K. Conditional Fisher’s exact test as a selection criterion for pair-correlation method. Type I and Type II errors. Chemom. Intell. Lab. Syst. 2001, 57, 1–14. [Google Scholar] [CrossRef]
Héberger, K.; Rajkó, R. Generalization of pair correlation method (PCM) for non-parametric variable selection. J. Chemom. 2002, 16, 436–443. [Google Scholar] [CrossRef]
Héberger, K.; Rajkó, R. Variable selection using pair-correlation method. Environmental applications. SAR QSAR Environ. Res. 2002, 13, 541–554. [Google Scholar] [CrossRef] [PubMed]
Biró, B.; Fodor, R.; Szedljak, I.; Husźar-Pásztor, K.; Gere, A. Buckwheat-pasta enriched with silkworm powder: Technological analysis and sensory evaluation. LWT—Food Sci. Technol. 2019, 116, 108542. [Google Scholar] [CrossRef]
Zay, K.; Gere, A. Sensory acceptance of poppy seed-flavored white chocolates using just-about-right method. LWT—Food Sci. Technol. 2019, 103, 162–168. [Google Scholar] [CrossRef]
Gere, A.; Sipos, L.; Héberger, K. Generalized pairwise correlation and method comparison: Impact assessment for JAR attributes on overall liking. Food Qual. Prefer. 2015, 43, 88–96. [Google Scholar] [CrossRef]

Table 1. Food and modeling type overview.

Food Type	Modeling Type	Reference
tea	MLR	[7]
ewe’s milk cheeses	MLR	[8]
lemon fruits	MLR	[9]
maize kernels	MLR	[10]
dry red wine	MLR, PLSR	[11]
black pepper and cumin	MLR, PLSR	[12]
sesame oil	PLSR	[13]
kiwi wine	PLSR	[14]
fat in lamb	PLSR	[15]
vegetable oil	PLSR	[16]
nut oils	PLSR	[17]
blood orange juice	PLSR	[18]
nut oils	PLSR	[19]
tomatoes	PLSR, PLS-DA	[20]
coconut	LDA	[21]
chickpeas	LDA	[22]
potato	LDA	[23]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karadžić Banjac, M.; Kovačević, S.; Podunavac-Kuzmanović, S. Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review. Processes 2024, 12, 583. https://doi.org/10.3390/pr12030583

AMA Style

Karadžić Banjac M, Kovačević S, Podunavac-Kuzmanović S. Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review. Processes. 2024; 12(3):583. https://doi.org/10.3390/pr12030583

Chicago/Turabian Style

Karadžić Banjac, Milica, Strahinja Kovačević, and Sanja Podunavac-Kuzmanović. 2024. "Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review" Processes 12, no. 3: 583. https://doi.org/10.3390/pr12030583

APA Style

Karadžić Banjac, M., Kovačević, S., & Podunavac-Kuzmanović, S. (2024). Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review. Processes, 12(3), 583. https://doi.org/10.3390/pr12030583

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review

Abstract

1. Introduction

2. The Advantages of Multivariate Chemometric Approaches

3. Regression: Linear Modeling in Foods

4. Regression: Non-Linear Modeling in Foods

5. Classification Techniques in Foods

6. Non-Parametric Methods in Foods

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI