1. Introduction
The quest to classify plant species has been a central theme in botany and ecology, carrying profound implications for biodiversity conservation, agriculture, and environmental monitoring. With an estimated 390,900 plant species known to science [
1], cataloging and differentiating each one presents a colossal challenge. The emerging field of automated plant leaf classification harnesses computational power to distinguish plant species based on their leaf characteristics, marking a vital frontier in botanical research.
The automation of plant leaf classification is a novel endeavor that seeks to transcend the limitations of human expertise and manual identification. It relies on the premise that leaves, the most accessible and abundant plant organ, hold key morphological features unique to each species. These features can be systematically analyzed and classified using advanced algorithms when captured in digital images. The primary objective is to design a system capable of learning from data, thereby identifying and categorizing leaves efficiently and accurately.
This technological solution involves several critical steps: capturing high-quality leaf images, extracting meaningful features from them, and then applying machine learning (ML) algorithms to classify the leaves into their respective species. The success of automated classification systems depends significantly on the accuracy and robustness of each step, especially the feature selection and the choice of machine learning models.
Figure 1 demonstrates the general structure of plant leaf classification using ML algorithms.
Feature selection in plant leaf classification is a crucial process that determines the success of the whole system. Researchers extract various features from leaf images, broadly categorized into shape, color, texture, and venation features. Shape features, such as aspect ratio, circularity, and leaf margin structure, provide information about the geometric properties of the leaf. Color features capture the pigmentation patterns that can be characteristic of certain species, while texture features relate to the surface structure and patterns of the leaf. Venation features, or the patterns formed by the veins of a leaf, are particularly useful as they are highly distinctive across different species [
2,
3].
Recently, the focus has shifted toward more sophisticated feature selection methods that consider individual features and the relationships and combinations of features that best contribute to classification. Techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been employed to reduce the dimensionality of the feature space, selecting only the most relevant features for classification.
Machine learning methods have revolutionized how researchers approach plant leaf classification. The commonly adopted algorithms include Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Random Forests (RF), and various forms of Neural Networks (NN), including the now prevalent Deep Learning models like Convolutional Neural Networks (CNNs). These methods vary in complexity, interpretability, and computational requirements, and the choice often depends on the size and nature of the dataset available for training and validation [
4].
Deep learning has shown great promise in the field due to its ability to learn hierarchical feature representations from the data. This often surpasses the performance of traditional machine learning methods that rely on handcrafted features. CNNs have been especially influential because they can directly process raw image data and automatically extract relevant features without significant human intervention.
Despite these advancements, automated plant leaf classification faces several challenges. One of the main issues is the high intra-class variability and inter-class similarity found in leaf features, which can complicate the classification process. For example, leaves from different species may appear very similar, or conversely, leaves from the same species may look different due to variations in environmental conditions, age, and other factors [
5].
Additionally, obtaining large datasets of leaf images that are sufficiently diverse and labeled with high accuracy is a non-trivial task. These datasets are crucial for training and testing machine learning models but require significant time and effort to compile.
Other practical challenges include dealing with the varying quality of leaf images due to differences in lighting, orientation, occlusion, and background clutter. This variability can introduce noise into the system, reducing the accuracy of feature extraction and classification. Moreover, the computational complexity of processing large image datasets with advanced machine learning methods can be prohibitive, requiring substantial computational resources.
Furthermore, models that can generalize well to unseen data are needed, as the ultimate test of an automated classification system is its performance in real-world conditions, where ideal image capture is not always possible. Developing models robust to variations in data quality and capable of being deployed in various environmental and technological contexts remains a significant hurdle.
Our work uses three types of datasets from the Cope Leaf Dataset [
6] for leaf classification. Enhancing classification accuracy, particularly with large datasets and numerous classes, requires a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. To improve the accuracy of data with multiple classes, we devised several feature selection, preprocessing, and training methods. We address a complex classification problem involving 99 classes from the Cope Leaf Dataset, tackling challenges such as class imbalance, overfitting, and feature complexity. Our approach to resolving these issues enhances classification accuracy, an aspect that has been underexplored in prior studies. Contributions to this work are listed as follows:
A complex classification problem is addressed involving 99 classes from the Cope Leaf Dataset, tackling challenges such as class imbalance, overfitting, and feature complexity. Our approach to resolving these issues enhances classification accuracy, an aspect that has been underexplored in prior studies.
Although preprocessing techniques like normalization, imputation, and noise reduction are well established, our systematic integration of these methods into a cohesive workflow tailored for a large multi-class dataset significantly improves model performance, as evidenced by the accuracy gains in our experimental results.
By comparing multiple machine learning (ML) algorithms, including the Hoeffding Tree, we provide new insights into their performance on high-dimensional, imbalanced multi-class datasets. The superior performance of the Hoeffding Tree over models like Naïve Bayes offers valuable contributions to understanding which models are more effective for complex classification challenges.
This paper is structured as follows: First, the introduction and literature review are presented in
Section 1.
Section 2 provides a comprehensive understanding of the topic.
Section 3 presents feature extraction and selection.
Section 4 introduces plant leaf classification using machine learning algorithms to delve deeper into the subject.
Section 5 outlines the preprocessing and classification methods used in this study, followed by the presentation of the experimental results in
Section 6. Lastly,
Section 7 and
Section 8 are dedicated to discussion, conclusion, and future recommendations.
2. Literature Review
Many studies in agriculture have deployed machine learning algorithms to improve the quality of plant growth and yields. These algorithms have allowed the authors to classify plants and early diagnose diseases by extracting features from the leaves with high accuracy.
2.1. Feature Extraction
Multiple studies are being exploited, focusing on applying machine learning to extract leaf features. Wu [
7] and Chaki [
8] both emphasize the importance of stable and discriminative features, with Chaki further focusing on texture and shape features, leading to an effective recognition of leaves with varying texture, shape, size, and orientations to an acceptable degree. The authors [
9] review the performance of various moment invariant techniques for a feature extraction process, concluding that a 100% classification rate can be achieved by using Total Mutual Information (TMI) and a General Regression Neural Network (GRNN). Another study [
10] builds upon this to classify images of diseased leaves using color and shape attributes and several ML algorithms, including the Support Vector Machine, the K-Nearest Neighbor, and the Decision trees. The authors in [
11] used image processing techniques to recognize plant species by identifying appropriate features for extraction and improving classification accuracy. They utilized feature extractions such as Contour-based and Region-based to achieve more precise results. Other studies [
12,
13,
14,
15,
16,
17] use different implementations of deep learning algorithms to extract features of leaf plants, leading to classification or disease detection. The achieved accuracy is between 94.8% and over 99%, making the deep learning algorithms the best tool for feature extraction.
2.2. Plant Leaf Classification
Research on plant leaf classifications with machine learning has shown promising results. Many authors have developed new models and algorithms to improve the level of accuracy in plant leaf classification. Araujo [
18] presents a multiple classifier system for plant leaf recognition that combines different classifiers trained on shape and texture features, improving identification performance by up to 28% compared to a monolithic approach and showing favorable comparisons with the best results reported in the literature for the datasets used. Later, Kala [
19] proposed a feature extraction technique based on sinuosity coefficients, which achieved high accuracy in plant species classification. Ali et al. [
20] presented a simple leaf recognition method that is computationally efficient and highly efficient for plant recognition. The proposed method was demonstrated on a publicly available leaf image database to demonstrate its usefulness and efficiency. Zhang [
21] designed a modified local discriminant projection (MLDP) algorithm that aims to extract discriminant features for the designed plant leaves by preserving the local geometrical structure of leaves, and it showed promising results in an experimental study. The performance of the MLDP method is tested on the public ICL leaf image database, and its performance validates its effectiveness and feasibility. Dudi et al. [
22] developed a new plant classification model based on enhanced segmentation and optimal feature selection. They created a hybrid algorithm C-EFO for the optimal feature selection and finalized the classification through the Enhanced Recurrent Neural Network (E-RNN) model. Their experiment was conducted in two plant leaf datasets and showed effectiveness and feasibility compared to the conventional models. The authors of [
23] proposed a model that can classify up to 79 different plant species in India using the Convolution Neural Network DenseNet-161 model architecture with a testing accuracy of 97.3%. The application works on any Android platform and can classify the input plant image with an average latency of 1.98 s. In [
24], the authors applied three technologies to achieve a model with high accuracy for plant classification. A Conditional Generative Adversarial Network was used to generate synthetic data, a Convolutional Neural Network was used for feature extraction, and a logistic regression classifier was used for efficient classification of plant species. From the eight datasets used, they achieved an average accuracy of 96.1% and up to 99 to 100% for individual datasets.
Different authors have simultaneously emphasized the outperformance of deep learning among other classification algorithms. Ojha [
25] and Aman [
26] both achieved high accuracy in classifying ornamental plant and flower leaves, respectively, using various machine learning algorithms, with Multi-Layer Perceptron (MLP) performing the best and Kala [
27] highlighting that Convolutional Neural Network (CNN) categorization surpasses traditional methods in accuracy. Similarly, the authors of [
28] reviewed various techniques such as support vector machine, ResNet, K-nearest neighbor, VGG-16, and others for leaf classification. They discussed the differences between machine learning and deep learning regarding the plant leaf classification problem and concluded that deep learning models achieve maximum accuracy. Parate et al. [
29] also used machine learning to identify leaves of mangoes, oranges, and guavas through Google Teachable with high accuracy.
2.3. Leaf Disease Identification
A range of machine learning techniques have been explored to identify leaf diseases. Kethineni [
30] highlights the use of image processing and ML algorithms, achieving high accuracy with a combination of k-Means, Firefly optimization Algorithm (FA), and Support Vector Machine (SVM). These algorithms were exploited to identify plant diseases, emphasizing the importance of plant health in agriculture and global warming. The new proposed technique has an achieved accuracy of 91.3%. The authors of [
31] conducted a study to explore using various machine learning algorithms for leaf disease detection and classification, such as SVM, KNN, SGD, XGB, and random forest. Evaluation parameters like F1 score, recall, precision, and accuracy are considered to evaluate the performance of the algorithms. The experimental results show that the achieved accuracy varies between 66.66% for KNN and up to 83.33% for the SGD algorithm. In [
32], the authors aimed to detect leaf diseases in rice through a new CNN model combining MobileNetV2, DenseNet121, and NASNetMobile. The proposed model can improve the accuracy of classification performance to 97.5% when compared to the performance of three standard CNN models.
Similarly, a study [
33] was conducted to classify tomato leaf disease. The authors considered implementing CNN algorithms to achieve an accuracy of 99.89%. A system that observes the crops’ growth and leaf diseases continuously to advise farmers in need is proposed by the authors of [
34]. The proposed framework produces efficient crop condition notifications to terminal IoT components that assist in irrigation, nutrition planning, and environmental compliance related to farming lands. SVM and CNN are used to train the IoT infrastructures to detect the various types of leaf diseases.
In [
35], the authors propose a modern generic approach for wheat disease classification using Decision Trees and deep learning models. The refined models improved the decision tree algorithm accuracy by 28.5% and CNN accuracy by 4.3%. This knowledge-based system can help farmers apply appropriate management methods. The authors of [
36] tried to solve the problem of Rumex obtusifolius Linnaeus by introducing a hybrid CNN model involving Visual Graphics Group-16 for well-separated leaves, ResNet-50 for complex issues, and Inception-v3 for illumination problems. The model was tested on two benchmark datasets. This model achieved high accuracy rates of 97.51%, 97.4%, 94.45%, and 95.9% on accuracy, recall, precision, and F1-score, respectively. A Machine Learning Framework for the classification of Date Palm Disease was developed by [
37]. The framework uses 80 GLCM texture features and 9 HSV color moment features from leaflets. Two types of ML algorithms, classic SVM and KNN, and ensemble learning methods, RF and LightGBM, were tested. The SVM classifier outperformed the combined GLCM and HSV features, achieving an accuracy of 98.29%. The authors of [
38] proposed a vision-based automatic medicinal plant identification system using neural network techniques and deep learning. A novel DeepHerb dataset, consisting of 2515 leaf images from 40 Indian herbs, is used to identify plants. The model deploys ANN and SVM to extract features and classify the herbs. The proposed DeepHerb model learned from Xception, and ANN outperformed pre-trained models by 97.5% accuracy. Another study [
39] shows that the ANN algorithm and its variants, CNN, are highly effective in classifying tasks for monitoring paddy rice diseases. Potato plant disease detection is studied in [
40,
41] using CNN networks, and it gives satisfactory results. Ref. [
42] includes a review of 17 studies related to banana plant disease identification, ranging with an accuracy between 80% and 99.61%. These studies collectively underscore the potential of ML in this domain, with CNN, SVM, and Random Forest emerging as particularly promising approaches. In recent years, a novel approach has been exploited in leaf disease identification using data augmentation through Generative Adversarial Networks (GAN), providing promising results in improving the performance of traditional ML algorithms. The tomato leaf identification technique is proposed by [
43,
44]. Other types of plant leaf disease have been studied by [
44,
45,
46]. In contrast, our study investigates how state-of-the-art algorithms perform as the number of classes increases, offering a unique perspective on scalability and model behavior. While previous studies provide valuable insights, they do not explore the effects of scaling up the number of classes to the extent we do, making direct comparisons less meaningful in this context. Nonetheless, we are committed to providing a robust analysis of the performance of the models used in this study and how they fare with an increased number of classes.
Table 1 reflects a comparison of our study with current models:
These studies collectively highlight the potential of machine learning for accurate and efficient leaf feature extraction, classification, and disease detection.
3. Feature Extraction and Selection
Machine learning algorithms require relevant and important input features to predict the output. However, not all features are equally applicable in a prediction task, and some may even generate noise in the model. Feature selection and feature extraction are two approaches to handle this. Feature selection is picking a subset of relevant features from the initial set of features. The goal is to minimize the dimensionality of the feature space, simplify the model, and increase its generalization performance. In addition, feature selection enables the proposed algorithm to train faster, reduce computational complexity, and minimize the overfitting of a model [
51]. On the other hand, feature extraction is changing the original features into a new collection of more informative and compact features. The idea is to extract the most essential information from the original features and represent it in a lower-dimensional feature space.
Combining two or more feature extraction techniques (shape, texture, color, venation, etc.) gives better classification results than a single-feature extraction technique [
51,
52].
Figure 2 illustrates feature extraction and selection in machine learning with six phases. In phase 1, after raw data are collected from different sources, they will be preprocessed to select the target data we need to study. Afterward, feature extraction and selection processes are utilized in Phases 2 and 3 to extract and detect the most essential features. The preprocessing step is given in phase 4. This step is considered the second step in building a classification model. The collected data need to be preprocessed to ensure its quality. This involves handling missing values, dealing with outliers, and transforming the data into a format suitable for analysis. After we extract the best features of the leaf, we conduct a classification in phase 5 based on a trained classification algorithm (i.e., Convolutional Neural Network (CNN), Nearest Neighbor (KNN), Support Vector Machine (SVM), or Principal Components Analysis (PCA), etc. Using data mining techniques, the data are transformed into a structure that is suitable for interpretation and evaluation of the results in phase 6.
Plants have medical properties and provide food and oxygen. Therefore, the research community is very interested in this area, especially in plant classification and identification through the leaves. Various research studies identify plant properties from flowers or other parts; however, the leaf is considered the most reliable source of information since it is available all year round [
53].
Identification of plants using their leaves is becoming increasingly fascinating and trendy. Every leaf contains specific details that aid in the recognition of different types of plants. Many authors have studied this issue from different perspectives [
54,
55]. For example, the authors in [
54] seek to examine and assess the execution and effectiveness of various approaches for categorizing plants. Every method has its pros and cons when it comes to identifying leaf patterns. Leaf image quality is crucial; thus, a dependable leaf database is necessary to set up the machine learning algorithm for leaf recognition and validation. The study in [
56] suggests utilizing two features, namely, shape, and texture, as part of the proposed method. The method based on the shape will capture the outline characteristics of each leaf and subsequently evaluate the differences between them utilizing the Jeffrey-divergence metric. The patterns of edge gradients will be used to examine the overall texture of the leaf. Then, an incremental classification algorithm will combine the outcomes of these approaches.
On the other hand, in [
56], the authors stressed that leaf images must be pre-processed for plant identification using leaves to extract essential features. The authors in [
56] introduce plant identification through leaf characteristics with Multiclass SVM (MSVM) as the classifier. Through the provided analysis, the authors claim that a high recognition rate of 90% is received by using different datasets.
In this work, feature extraction aims to reduce data complexity while retaining as much relevant information as possible. This helps to improve the performance and efficiency of machine learning algorithms and simplify the analysis process. Feature extraction may involve the creation of new features and data manipulation to separate and simplify the use of meaningful features from irrelevant ones. The separation of the leaf object (foreground) from its background is referred to as segmentation. This procedure uses the adaptive threshold K-Means method [
28]. Following segmentation, geometric features are taken from the segmented image. For example, the aspect ratio and roundness (
R) of the leaf can be determined using the following very know formula:
The color of the leaves is considered a morphological characteristic. Several statistical characteristics, such as mean, skewness, and kurtosis, can be calculated in the color space to characterize leaf color attributes. This approach has low computer complexity and is suitable for real-time processing. Because the processing image contains three color planes (red, green, and blue), the mean of the three-color planes after the leaf contour can be used to estimate
Ch and
N, as shown below:
The values of red and blue are provided for normalization and to estimate the
ChN of healthy and dead leaves. Nitrogen can be calculated using equation:
where
HU,
SA, and
BG represent the hue value, saturation intensity, and brightness intensity of the colored image, respectively.
The maximum distance between two spots on the leaf object’s boundary in the processing image is referred to as effective diameter (
ED). The effective diameter can be estimated using another morphological property of a leaf known as area, as follows:
4. Machine Learning for Plant Leaf Classification
Plant leaf classification through machine learning is a rapidly developing study area with significant challenges and promising opportunities. The application of machine learning has allowed researchers to address complex problems previously considered impossible by conventional computational methods. Using advanced computational models, plant species classification is now being redefined through the lens of agricultural practices, environmental conservation, and medicinal research [
57].
Researchers have been motivated to develop machine learning techniques to classify leaves to prevent plant species’ extinction. This ensures that plant biodiversity is better understood and utilized. Furthermore, it serves multiple objectives, from conserving ecosystems to identifying plant species with potential undiscovered medicinal properties [
57].
However, classifying plant leaves is complex due to the diverse morphologies that appear. Various factors, such as their geographical location and seasonal changes, can affect the size, shape, and color of these plants. These variations introduce difficulties in developing a universal classification system [
58]. In addition, the sheer number of plant species adds an additional layer of complexity that requires sophisticated, adaptable, and highly nuanced machine learning algorithms [
57].
To meet these challenges, several machine learning models have been widely implemented, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and K-Nearest Neighbors (K-NN). The application of each model to the problem of plant leaf classification presents different advantages [
58,
59].
CNNs have become dominant because they have been found to learn features in an automatic, ‘end-to-end’ way, directly from images without manual feature extraction. CNNs use hierarchical layers to capture fine details such as leaf shape, venation, and texture. One reason CNNs are so effective at processing leaf images is that this complex visual input requires the ability to learn representations that capture small morphological details. CNNs also excel in tasks where intra-class variability (differences within the same species) is high, as their deep learning structure allows for greater flexibility in identifying variations in patterns like leaf vein arrangement or surface texture [
60,
61].
On the other hand, SVMs tend to be used when the data are represented with a manually extracted set of features. In high-dimensional spaces, where the decision boundary between classes has to be optimized, these models are quite effective [
60,
61]. Features such as leaf perimeter, aspect ratio, and texture features extracted from the Gray Level Co-occurrence Matrix (GLCM) [
59] are used by SVMs. Manually extracted features of the leaf give a structured view of the leaf, on which SVMs learn the optimal hyperplane to separate the feature space and thus classify species. In this case, SVMs have been shown to perform well until the features become inseparable due to complexity.
As with CNNs or SVMs, the K-NN algorithm has also been used for plant leaf classification. K-NN finds the new leaf sample, compares it with all stored examples, and then classifies it according to the majority class of its nearest neighbors. This model is strong for class separation and clear geometric features learned from the feature space. Unfortunately, performance can be jeopardized by high-dimensional data or overlapping classes. Because K-NN is often inefficient and inaccurate in high-dimensional feature spaces, it is often enhanced with dimensionality reduction techniques such as Principal Component Analysis (PCA) [
57,
62].
The success of these models largely depends on feature extraction. The literature has explored the shape, texture, and venation feature extraction of plant leaves. Differentiation among species based on overall structure, as well as shape features such as perimeter, convexity, and aspect ratio, are important in identifying species. GLCM is often employed to extract texture features of surface characteristics that are not readily apparent from the leaf’s shape alone. Additionally, edge detection techniques can be used to extract venation features, which give a detailed map of the internal structure of the leaf, adding discriminative power when similarly shaped species differ with respect to their internal vein structure [
60,
63].
Machine learning algorithms have proven valuable in overcoming these challenges. Multilayer Perceptron (MLP), Support Vector Machine (SVM), and other models have been employed and evaluated based on specific performance metrics [
31,
64].
Machine learning has been significantly accelerated by advancements in deep learning, especially with the enhancement of neural network frameworks such as Convolutional Neural Networks (CNNs). CNNs have gained popularity due to their inherent capability of processing and deciphering complex imagery, particularly plants. Due to their sophisticated image recognition capabilities, CNNs have demonstrated robust performance in a wide range of plant classification tasks, thereby establishing new standards for accuracy and reliability.
As illustrated in
Figure 3, the workflow of leaf classification commences with image preprocessing to standardize and improve the quality of the input images. This step ensures that lighting, scale, and orientation variations do not affect the subsequent feature extraction process. After preprocessing, feature extraction is performed to capture the intrinsic characteristics of the leaves, such as texture, shape, and color features. Once these features are extracted, feature selection is employed to identify and retain the most significant features that contribute to differentiating between leaf species, thereby reducing the complexity of the model. The selected features are then fed into a machine learning classifier trained to recognize and classify the various leaf types. The outcome of this process is a set of results that reflect the classifier’s accuracy in leaf classification.
It is critically important to realize that the effectiveness of machine learning models in plant leaf classification is heavily dependent on image processing and feature extraction techniques. Techniques like the Gray Level Co-occurrence Matrix (GLCM) play a significant role in mining out critical features such as leaf texture, which are vital for classifying leaves. These characteristics form the basis for machine learning models to distinguish between species [
51].
In addition to these models, commonly used performance metrics to evaluate plant leaf classification are accuracy, precision, recall, and F1-score. These metrics allow us to assess how well a model separates species accounts for misclassifications, and finds the right balance between precision and recall. The accuracy of CNNs in complex classification tasks has consistently been shown to be the best of the three compared, as they can identify fine details in images. SVMs generally perform well with structured, high-dimensional features, and K-NN provides a simple, interpretable baseline for comparison.
We applied these machine learning techniques to classify plant leaves in our study. Based on the unique challenges posed by our dataset, we addressed them by automatically extracting features using CNNs, and manually extracting features using SVMs and K-NN, achieving high classification accuracy. In our approach, we blended shape, texture, and venation features to form a single comprehensive model, which is robust against variation in plant morphology.
Feature extraction techniques are crucial for the success of the machine learning process. They stand as a cornerstone technique, drawing upon the shape, texture, and venation of leaves as key discriminative attributes. The shape, encompassing the leaf’s outline, tip, base, margins, and vein arrangement, is discerned to classify plant species distinctly [
65]. Concurrently, geometric features such as the leaf’s length, width, area, and perimeter are quantifiable data points during extraction [
63]. Texture captures the surface patterns, and venation analysis, studying the leaf’s vein patterns, is pivotal in recognizing and differentiating various plant types [
65,
66]. Notably, the adoption of both region-based shape descriptors (RBSD) and contour-based shape descriptors (CBSD) in feature extraction—where RBSD extracts shape features from within the leaf and CBSD from the leaf’s edge—can significantly bolster the accuracy of classification systems when used in tandem [
67]. These multifaceted feature extraction methods, rooted in robust image processing techniques, empower machine learning algorithms to navigate the complex domain of plant taxonomy effectively.
Each of these techniques contributes to creating a comprehensive feature set that machine learning models such as K-Nearest Neighbors (K-NN), Support Vector Machines, and Random Forest can exploit to classify plant leaves accurately [
64,
68]. The performance of these techniques can be evaluated using metrics such as accuracy, precision, recall, and F1 score [
68].
Despite these advancements, plant leaf classification using machine learning has its challenges. Variability in leaf characteristics and the vast diversity of plant species necessitate a continuous effort to develop more advanced image processing and feature extraction techniques. The objective is to construct machine learning models that are accurate and robust against the variations presented by nature [
31,
51,
57,
58].
To surmount these challenges, various strategies have been proposed. One approach includes developing methods to account for the variability in leaf characteristics due to environmental and geographical factors. Another involves improving the training data by including a broader spectrum of leaf conditions, such as addressing the imbalance caused by excluding damaged or diseased leaves from datasets. Deep learning techniques have been pivotal in this regard, offering a means to enhance the accuracy of plant leaf classification by using sophisticated image processing methods for feature detection and extraction [
58,
67,
69,
70].
5. Proposed Methodology
Classifying many classes introduces specific challenges in machine learning, such as increased computational complexity, greater risk of overfitting, and difficulty in managing class imbalance. In this work, we aim to improve the classification accuracy of the Cope Leaf Dataset [
6], which has 1584 images with 99 labels. The Naïve Bayes classifier gives the highest accuracy (85.18%) without a proposed methodology. We developed several feature selection, preprocessing, and training methods to increase data accuracy with many classes. Increasing accuracy in classification, especially when dealing with large datasets and many classes, involves a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. The following steps are applied to enhance the accuracy of the classification of the Cope Leaf Dataset for 99 label data:
Noisy and mislabeled data are removed.
Data augmentation techniques (rotation and shifting) are applied to increase the diversity of the training set.
The dataset is balanced according to the number of records on each label.
Data are shuffled to ensure that the distribution is random, which helps in reducing bias during training.
Most relevant features are selected using correlation analysis.
New features are created from existing data via statistical methods such as min, max, range, etc.
Data are converted into a new structure in a consistent format.
Classifying datasets with many classes (large label classification) requires effective strategies to manage high-dimensional output spaces. Feature extraction, feature selection, and data preprocessing are crucial steps to ensure that classification algorithms can work effectively [
71,
72]. The next step selects data for training and testing, and several classification algorithms are applied. Bayes Net (BN), Naïve Bayes Classifier, Multilayer Perception (MP), Hoeffding Tree (HT), J48, Random Forest (RF), and Convolutional Neural Networks (CNN) algorithms were applied to classify 99 label leaf datasets. Naive Bayes classifiers are simple probabilistic models that assume independence between features given the class. This simplicity makes them computationally efficient and scalable, even when dealing with many classes [
73].
On the other hand, Decision Trees tend to overfit the training data, especially when the number of classes is large. Both probabilistic and rule-based algorithms are applied in several categories. Classification with 2-class, 5-class, and 10-class has very high accuracy. After structuring the form and adding new features, the Naïve Bayes Classifier gives 89.63%, Multilayer Perception gives 89.48%, and the Hoeffding Tree gives 89.92% accuracy, promising for the 99-label class dataset. In this study, 80% of the data is allocated to the training set for developing the machine learning models, while the remaining 20% is reserved as the test set to assess the models’ performance. MP model has three hidden layers of 128, 64, and 32 neurons with ReLU activation. The model was trained using the Adam optimizer and categorical cross-entropy loss for multi-class classification. The architecture of the MP was chosen to be complex enough to capture the non-linear relationship between the leaf features without overfitting.
In this study, we initially employed an 80/20 split of the data for training and testing, which provided us with a basic understanding of the model’s performance. However, to ensure the robustness and generalization ability of our models, we additionally conducted k-fold cross-validation (with k = 5 or 10). This allowed us to assess performance across multiple subsets of the data and confirmed that the models perform consistently and generalize well across different training and testing splits. Moreover, stratified cross-validation was used to maintain the class balance across all folds, which is particularly important in our 99-class dataset, where class imbalance could skew performance. The results showed stable accuracy and precision across folds, further confirming the model’s stability.
In many real-world scenarios, datasets with many classes often exhibit class imbalance. The HT algorithm can effectively handle skewed class distributions by continuously updating based on the incoming stream and focusing on the most relevant features. The HT algorithm gives the best classification results in this research work with the NBC and MP algorithms. Algorithm 1 demonstrates the pseudocode of the HT algorithm below.
Algorithm 1. Hoeffding Tree Algorithm |
1: | Input: S: Stream of instances; P: Desired probability; T: Tie-breaking threshold |
2: | Initialize an empty Hoeffding Tree (HT) with a single leaf |
3: | for each instance X in S do |
4: | | Traverse HT to find the appropriate leaf L for X |
5: | | Update statistics at leaf L with X |
6: | | if the number of instances at L reaches MIN then |
7: | | | Compute Information Gain (G) for each attribute |
8: | | | Let Xa be the attribute with the highest G and Xb be the attribute with the second highest G |
9: | | | Compute desired probability (P) using the Hoeffding bound |
10: | | | if (Xa − Xb > P) or (P < T) then |
11: | | | | Split L using the best attribute Xa |
12: | | | | Create new leaves for each branch of the split |
13: | | | | Distribute the instances of L among the new leaves |
14: | | | else |
15: | | | | Continue without splitting |
16: | | | end if |
17: | | end if |
18: | end for |
19: | return the trained Hoeffding Tree (HT) |
The Hoeffding Tree (HT) algorithm was the most effective in our study with an accuracy of 89.92%, and this was due to its incremental learning approach and its capability to deal with high dimensionality and imbalance data. The dynamic construction of the decision tree and the selection of features in HT also eliminated overfitting and achieved high accuracy for such features as margin, texture, and shape. Naïve Bayes Classifier (NBC) was the second best with an accuracy of 89.63%. While its probabilistic model is fast, the independent assumption of features somewhat constrained the model from capturing higher-order interactions, thus slightly lagging HT. Multilayer Perception (MLP) had a classification accuracy of 89.48%. Despite its capability to model non-linear relationships between the features of the leaves, the algorithm had a high tendency of overfitting especially with the class imbalance.
However, the J48 algorithm gave the lowest accuracy of 57.87%. The static decision tree construction led to its overfitting, and the problem of handling imbalanced classes also led to its poor performance. This was because J48 could not generalize well in this complex, multi-class dataset, as observed earlier, which is the case with our dataset.
6. Experimental Results and Discussions
Plant classification using machine learning algorithms involves training algorithms to identify and categorize different species of plants based on features extracted from images, such as leaf shapes, textures, colors, and other morphological characteristics. This work applies the Cope Leaf Dataset to several feature selection and ML algorithms to achieve a suitable model to classify plant leaves. The Cope Leaf Dataset is a popular dataset used for leaf classification tasks in machine learning. It consists of images of leaves from various plant species, along with corresponding labels indicating the species of each leaf. Like other leaf classification datasets, the Cope leaf dataset may present challenges such as class imbalance, variations in image quality, and intra-class variability due to factors like leaf aging and environmental conditions. We aim to address these challenges by building robust classification models. The dataset consists of 1584 images with 99 labels. Each record has 64 attributes, including each label’s characteristics for margin, shape, and texture [
6].
In this work, both probabilistic and rule-based machine learning algorithms are applied to the raw dataset.
Table 2 shows the accuracy and statistical analysis of classification with the plant leaf margin feature set. The mean absolute error (MAE), Root Mean Squared Error (RMSE), and Relative Absolute Error (RAE) are given in
Table 2.
Table 3 introduces the texture feature set, which results similarly to the shape feature set. Results show that classification using a margin feature set creates a more suitable model plant leaf dataset. The volume of the dataset and the number of classes (99) make the classification problem very challenging. The Naive Bayes classifier gives the highest accuracy for the raw data. Naive Bayes classifiers perform well because they make strong and naive independence assumptions about the features. It provides straightforward explanations of the classification decision based on the probabilities of different classes given the input features [
74,
75,
76].
Figure 4 demonstrates the accuracy of ML algorithms with the classification of margin, texture, and all feature datasets.
Accuracy helps assess the quality and effectiveness of the model in solving the classification task. A high level of accuracy indicates that the model has learned meaningful patterns and relationships in the data, leading to accurate predictions. On the other hand, low accuracy may indicate issues such as underfitting, overfitting, or inadequate feature representation. In this work, several feature selection and machine learning algorithms are proposed to increase the accuracy of the classification. In this part of the experiments, we move from a classification task with a small number of classes (2 classes) to one with a high number (99 classes) of classes. Classification analysis of classes in the dataset with 2, 5, 10, 25, 50, and 75 is demonstrated in
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9. The classification problem is typically less complex with fewer classes, and the decision boundaries between classes may be more distinct. Simple models like Logistic Regression or Naive Bayes classifiers may perform well on such tasks due to their ability to capture linear or simple relationships between features and classes. However, as the number of classes increases, the problem becomes more complex, requiring more sophisticated models like neural networks or ensemble methods to accurately capture the intricate relationships between features and classes. Additionally, addressing challenges like class imbalance, data sparsity, and feature relevance becomes increasingly important as the number of classes in the classification task grows. The classification accuracy with a 2-class, 5-class, and 10-class problem is 100% with the Naïve Bayes classifier and more than 96% with the multilayer perception algorithm.
The mean absolute error (MAE) calculates the average of the absolute differences between predicted and actual values. Root Mean Squared Error (RMSE) is a frequently used measure of the differences between values predicted by a model, or an estimator and the actual values observed. RMSE is particularly useful in machine learning-based classification modeling to evaluate the performance of a model. Relative Absolute Error (RAE) measures the total absolute error in prediction relative to the total absolute error of a simple baseline model, usually the mean of the observed data. Precision is the ratio of true positive predictions to all positive predictions. It tells us how many of the positive predictions made by the model are correct. Another metric recall, also known as sensitivity or true-positive rate, is the ratio of true-positive predictions to all actual positive cases. F1 metric provides a single metric that balances both precision and recall. All these metrics demonstrate the absolute and relative performance of the plant leaf classification model.
True-positive (TP) and false-positive (FP) metrics are critical metrics to measure the success of the developed models in machine learning. Because they offer detailed insights into the performance of classification models, they enable the calculation of essential evaluation metrics, help understand model bias and behavior, support cost-sensitive decision-making, and aid in fine-tuning and comparing models for optimal performance. The average values of TP and FP after classification are given in the tables. Our plant leaf classification methods have high TP and low FP values, indicating that the developed models perform well.
On the other hand, accuracy is lower in 25-class, 50-class, and 75-class problems. NBC gives 96.21% accuracy with the 25-class classification, and MP gives 94.70% accuracy. Tables demonstrate other statistical evaluation measurements, such as MAE, RMSE, RAE, TP, and FP ratios.
Figure 5 shows the accuracy of classification for 2, 5, 10, 25, 50, and 75 labels.
Figure 6 demonstrates MAE, RAE, RMSE, and TP rates for the same number of labels in the leaf classification problem [
77]. Several feature selection and preprocessing methods are applied to increase plant leaf classification accuracy, such as data cleaning, transformation, feature engineering, and splitting the data into appropriate sets. Experiments show in
Table 10 that MP algorithms give 89.48% accuracy, NBC gives 89.63% accuracy, and the HT algorithm gives 89.92% accuracy for the classification of the 99 label problems. Fewer classes make classification easier and more accurate from a theoretical standpoint, but in practice, algorithms J48 can behave differently depending on the nature of the data and specific model hyperparameters. Therefore, J48 has lower accuracy than other classification algorithms in this dataset. CNN achieved 88.72% accuracy, slightly lower than Hoeffding Tree (89.92%), but higher than other models like Random Forest (86.81%).
Figure 7 shows examples of plant leaves that were accurately classified by the proposed model used in this study. These samples highlight the model’s ability to correctly distinguish plant species based on distinguishing features such as shape, texture, and venation patterns, even within a multi-class classification setup.
Figure 8 displays examples of plant leaves that were misclassified by the models, illustrating the challenges faced in classification. Misclassification often arose from factors such as high intra-class variability, inter-class similarity, and image quality issues, which can obscure unique features critical for accurate classification [
6].
Table 11 compares the proposed model with several state-of-the-art algorithms such as the Support vector machine (SVD), ResNet-50, DenseNet-121, random forest, and CNN. The proposed model outperforms other state-of-the-art models in accuracy, F1 score, and recall, demonstrating strong classification capabilities for large, complex plant datasets. CNN and SVM are also competitive, particularly in precision and overall error rates.
7. Discussion
During the classification process, some algorithms struggled to accurately identify certain plant leaf samples due to various factors. These included class imbalance, high variability within the same class, similarities between different classes, complexity and redundancy in features, overfitting, and inconsistencies in image quality. A key challenge is class imbalance, where some species are overrepresented while others have very few samples. Algorithms like Naive Bayes or Multilayer Perceptron may have difficulty with underrepresented classes, leading to misclassifications, especially for rare species. Hoeffding Trees, which adaptively learn, manage imbalance better but still face difficulties in extreme cases. For example, if one species has 200 samples and another has only 5, the algorithm tends to favor the more common species, making it harder to correctly classify the rarer ones.
Additionally, leaves from the same species may appear different due to factors like aging, lighting, or the season in which they were collected, while leaves from different species might look very similar. This creates challenges for the model. For instance, two species may have similar textures but different shapes, and if the algorithm focuses too much on texture, it may misclassify the species. Conversely, young and old leaves of the same species can look quite different, causing issues with intra-class variability. Even though advanced techniques like PCA and LDA were used to reduce feature dimensionality, some irrelevant features may remain, adding noise to the model and complicating classification. For example, models like Random Forest may struggle to differentiate between species if certain features, like shape, dominate the classification process when texture should also be considered.
One of the reasons why the Hoeffding Tree algorithm outperforms others, such as J48, is its ability to handle high-dimensional, imbalanced datasets effectively. The Hoeffding Tree’s incremental learning approach allows it to adapt dynamically to the data as they are streamed, making it particularly well-suited for datasets with a large number of classes, such as the 99-class Cope Leaf Dataset. It avoids overfitting by only making statistically significant splits, ensuring that the model generalizes well to unseen data. On the other hand, J48 constructs a static decision tree based on the training data, which often leads to overfitting, especially in complex, multi-class problems. The algorithm tends to create overly complex models that do not perform well on new data, especially when class imbalance is present. This, combined with its tendency to favor majority classes, results in its lower accuracy compared to the Hoeffding Tree in this study.
Complex models like Multilayer Perceptron may overfit the training data, performing well on known samples but poorly on new ones. This happens when the model learns irrelevant details from the training data, failing to generalize to new leaf images. For instance, if MLP classifies based on specific edge details that are not consistent across all samples, it will struggle with slightly different images. Variations in image quality, such as lighting, focus, or background clutter, can introduce noise that affects feature extraction. Some algorithms, like Naive Bayes, which rely on probabilistic methods, are more vulnerable to this noise, while others, like Random Forest, handle it better. For example, a blurry or poorly lit image may lose key details like leaf venation, leading to misclassification. Similarly, if the margins of the leaf are not clearly visible, the model may incorrectly classify based on shape.
8. Conclusions
In this study, we addressed the complex problem of plant classification using the Cope Leaf Dataset, which comprises 1584 images of leaves from 99 different species, each characterized by 64 attributes related to margin, shape, and texture. We aimed to build robust classification models to accurately identify plant species based on these leaf characteristics despite challenges such as class imbalance, variations in image quality, and intra-class variability due to environmental factors and leaf aging.
For the largest classification task involving 99 classes, we observed that preprocessing steps, including feature selection and data transformation, significantly enhanced model performance. After these steps, the Naive Bayes Classifier achieved an accuracy of 89.63%, Convolutional Neural Networks has 88.72%, while the Hoeffding Tree algorithm reached 89.92%, underscoring the effectiveness of these enhancements.
In conclusion, our study demonstrates the potential of machine learning algorithms, particularly the Hoeffding Tree algorithm, for classifying plant species using leaf images. The results highlight the importance of appropriate feature selection, preprocessing methods, and the choice of algorithm in tackling the complexities of large-scale plant classification tasks. Future work could further optimize these models, explore deep learning approaches, and expand the dataset to include more diverse plant species for broader applicability.