1. Introduction
Anthocyanins are phenolic water-soluble glycosides or acyl-glycosides of anthocyanidins [
1]. They are widely distributed in plants [
2]. Anthocyanins are water-soluble flavonoid pigments [
3] that accumulate in various organs and are typically stored in vacuoles in the epidermis or mesophyll [
2,
4]. They contribute to orange-to-blue colors but primarily provide red, purple, or blue hues to leaves, fruits, and flowers [
5,
6]. The color primarily depends on the anthocyanin type and content [
7], pH, co-pigments, and metal ions [
2]. Anthocyanins are important secondary plant metabolites produced from the amino acid phenylalanine through the anthocyanin biosynthetic pathway [
8]. Their biosynthesis can be affected by biotic or abiotic stresses, such as nutrient deficiency, wounding, pathogens, drought, light, salinity, cold, and ultraviolet (UV) irradiation [
7,
9,
10]. Anthocyanins fulfill essential physiological functions related to adaptation and protection against stresses [
5,
9]. Accurate detection and quantitative assessment of anthocyanins can provide valuable information on the physiological responses and adaptation of plants to environmental stresses [
11].
Wet chemistry is the most common method for quantifying the anthocyanin content in plant tissues. This method is highly accurate [
12] but time-consuming, labor-intensive, tedious, expensive, and destructive [
9,
13]. Due to advances in spectroscopy and computer analyses, spectroscopic techniques have been widely used to detect the anthocyanin content of plants. Neto et al. [
14] predicted the leaf anthocyanin content of
Lactuca sativa using partial least squares regression (PLSR) and visible and near-infrared (NIR) spectroscopy. Liu et al. [
11] used visible and NIR spectroscopy, principal component regression (PCR), PLSR, and a back-propagation neural network (BPNN) to predict the leaf anthocyanin content of
Prunus cerasifera. These methods are simple, sensitive, non-invasive, and efficient [
11], but the equipment is relatively expensive, and the environmental requirements are high [
8]. Digital photography has been increasingly used to analyze plant color and quantify the pigment content based on color parameter values extracted from digital images acquired by digital cameras. This method is widely used because it is fast, economical, efficient, reliable, and non-invasive [
5,
7,
15]. For example, Yang et al. [
13] calculated six color parameters in both RGB and HIS color space from digital images of
L. sativa leaves to generate 37 color indices, and then prediction models were developed based on these color indices using curve estimation to predict the anthocyanin content. Del Valle et al. [
7] extracted RGB values from digital images to generate 12 color indices, then utilized these indices to construct models to estimate relative anthocyanin concentrations in species with color variations via PSLR. Askey et al. [
8] computed color index values from digital images of
Arabidopsis thaliana leaves across five color spaces, and developed models to predict the anthocyanin content utilizing twenty-two regression models. These studies were all based on digital images and utilized color index values obtained from various color spaces, employing different modeling methods. However, they mainly focus on plant leaves and rarely use machine learning to build models.
Machine learning is a branch of artificial intelligence, widely used to construct estimation models. Mathematical or computer algorithms are employed to train a computational model to solve a problem or perform complex tasks based on input parameters [
16]. The algorithm learns to perform tasks based on input data. This method has been used for pattern recognition, classification, and prediction. Machine learning algorithms have high accuracy, automation, and speed, they can be customized and applied at different scales [
17], providing excellent performance.
Machine learning algorithms include BPNN, support vector regression (SVR), random forest (RF), extreme learning machine (ELM), and Cubist [
18]. BPNN and RF algorithms are two commonly used machine learning algorithms. BPNN is a multilayer feed-forward neural network that corrects errors using a back-propagation algorithm [
19]. It has strong nonlinear mapping, self-learning, self-adaptive, and generalization capabilities and high fault tolerance [
19,
20] and is suitable for regression or classification problems. Therefore, it is the most widely used neural network model and is particularly useful for solving nonlinear problems [
21]. RF is a supervised machine learning ensemble algorithm based on the if-then-else rules. It was proposed by Breiman [
22] and is known for its robustness, ability to handle high-dimensional data, and resistance to overfitting, noise, and outliers [
23]. It is insensitive to collinearity [
24] and effective in handling high-dimensional data and covariance among variables [
25]. Furthermore, RF provides high prediction accuracy with low computational complexity due to random sampling [
26]. Thus, it is a popular machine learning algorithm for classification and prediction.
Rosa chinensis is a popular flower worldwide. It originated in China and was spread from the Silk Road to Persia, Ceylon, and other countries [
27]. Since this flower blooms year-round and produces flowers with diverse colors [
28], it has been widely planted and cultivated as an ornamental plant. This flower also has many other values, e.g., cultural [
29], medicinal [
30,
31], and edible [
32,
33]. These values are attributed to the abundance of anthocyanin in the petals, but the anthocyanin content significantly affects these values. Thus, estimating the anthocyanin content of
R. chinensis petals is essential to assess these values.
This study predicts the anthocyanin content of R. chinensis petals using RGB indices and BPNN and RF algorithms. The objective is to investigate the feasibility of using RGB indices and BPNN and RF algorithms to predict the anthocyanin content of R. chinensis petals accurately. We hypothesize that RGB indices combined with BPNN and RF can accurately predict the anthocyanin content of R. chinensis petals.
4. Discussion
Using the RGB color space is the most common approach to describe a color quantitatively [
43]. The RGB value refers to the sum of the three channels (R, G, B) [
44], where R, G, and B denote the mean values of the red, green, and blue channels. The digital images can be acquired using a digital camera, smartphone, and scanner. Image processing methods are used to extract the RGB values from digital images and construct RGB indices [
45]. Thus, digital photography is a simple, quick, and low-cost method that has been widely used to predict the content of plant pigments. For example, Hassanijalilian et al. [
46] used RGB indices to predict the leaf chlorophyll content of
Glycine max. Wood et al. [
47] observed that the RGB indices enabled the estimation of the
a,
b, and total chlorophyll concentrations of microalgal cultures in situ. Taha et al. [
48] demonstrated the feasibility of using RGB indices to estimate the chlorophyll content of lettuce. In this study, the correlations between 28 RGB indices derived from digital camera images and the anthocyanin content were strong, indicating that these indices were suitable for establishing predictive models of the anthocyanin content. The R
2 and RPD values of the BPNN and RF models were greater than 0.75 and 2.00 (
Table 5), respectively. An R
2 value higher than 0.7 is indicative of a high-fitting model that explains 70% of the variance [
49]. The RPD value exceeding 2.0 indicates that the models had exceptional prediction ability [
11,
26,
50]. These findings imply that predicting the anthocyanin content of
R. chinensis petals using RGB color indices derived from digital images combined with BPNN and RF is feasible. Both models exhibited excellent robustness and high predictive ability.
Machine learning algorithms and RGB images have been successfully used to predict plant pigment content, such as the anthocyanin content of
A. thaliana leaves [
8] and the chlorophyll content of
G. max leaves [
46]. BPNN and RF are machine learning algorithms. BPNN learns complex nonlinear relationships by iteratively adjusting the weights to minimize the error between the predicted and measured results [
51]. RF uses multiple decision trees during training and combines their predictions to improve accuracy [
22]. This study utilized BPNN and RF to predict the anthocyanin content of
R. chinese petals. The RF model had higher R
2 and RPD values and lower RMSE and MAE values than the BPNN for the calibration and validation sets (
Table 7). These results indicated that the RF algorithm outperformed the BPNN algorithm, consistent with previous findings. For example, Guo et al. [
52] established prediction models for the leaf chlorophyll content of maize and found that the RF algorithm achieved better prediction results than the BPNN algorithm. Yang et al. [
53] demonstrated that the RF outperformed the BPNN in predicting the chlorophyll content of trees in coniferous, broad-leaved, and mixed broad-leaved forests and of individual trees. The better performance of RF can be contributed to its insensitivity to multicollinearity, a common problem with RGB indices.
This study has some limitations. First, we used detached petals to enable their analysis under consistent light conditions. Thus, the prediction performance should be assessed for in situ conditions. Second, the petals were obtained from the same location. Future studies should select samples from larger areas. Third, it should be investigated whether images acquired with smartphones could be used with this method instead of cameras since smartphones are ubiquitous. The proposed improvements would improve the application of this method.