Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning

Elbasi, Ersin; Topcu, Ahmet E.; Cina, Elda; Zreikat, Aymen I.; Shdefat, Ahmed; Zaki, Chamseddine; Abdelbaki, Wiem

doi:10.3390/app142210507

Open AccessArticle

Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning

by

Ersin Elbasi

^*

,

Ahmet E. Topcu

^*,

Elda Cina

,

Aymen I. Zreikat

,

Ahmed Shdefat

,

Chamseddine Zaki

and

Wiem Abdelbaki

College of Engineering and Technology, American University of the Middle East, Egaila 54200, Kuwait

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10507; https://doi.org/10.3390/app142210507

Submission received: 22 October 2024 / Revised: 9 November 2024 / Accepted: 12 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Smart Agriculture Based on Big Data and Internet of Things (IoT))

Download

Browse Figures

Versions Notes

Abstract

:

In botany and agriculture, classifying leaves is a crucial process that yields vital information for studies on biodiversity, ecological studies, and the identification of plant species. The Cope Leaf Dataset offers a comprehensive collection of leaf images from various plant species, enabling the development and evaluation of advanced classification algorithms. This study presents a robust methodology for classifying leaf images within the Cope Leaf Dataset by enhancing the feature extraction and selection process. Cope Leaf Dataset has 99 classes and 64 features with 1584 records. Features are extracted based on the margin, texture, and shape of the leaves. It is challenging to classify a large number of labels because of class imbalance, feature complexity, overfitting, and label noise. Our approach combines advanced feature selection techniques with robust preprocessing methods, including normalization, imputation, and noise reduction. By systematically integrating these techniques, we aim to reduce dimensionality, eliminate irrelevant or redundant features, and improve data quality. Increasing accuracy in classification, especially when dealing with large datasets and many classes, involves a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. The results indicate that the Multilayer Perception algorithm gives 89.48%, the Naïve Bayes Classifier gives 89.63%, Convolutional Neural Networks has 88.72%, and the Hoeffding Tree algorithm gives 89.92% accuracy for the classification of 99 label plant leaf classification problems.

Keywords:

smart farming; plant leaf classification; agriculture; machine learning; feature selection; data preprocessing

1. Introduction

The quest to classify plant species has been a central theme in botany and ecology, carrying profound implications for biodiversity conservation, agriculture, and environmental monitoring. With an estimated 390,900 plant species known to science [1], cataloging and differentiating each one presents a colossal challenge. The emerging field of automated plant leaf classification harnesses computational power to distinguish plant species based on their leaf characteristics, marking a vital frontier in botanical research.

The automation of plant leaf classification is a novel endeavor that seeks to transcend the limitations of human expertise and manual identification. It relies on the premise that leaves, the most accessible and abundant plant organ, hold key morphological features unique to each species. These features can be systematically analyzed and classified using advanced algorithms when captured in digital images. The primary objective is to design a system capable of learning from data, thereby identifying and categorizing leaves efficiently and accurately.

This technological solution involves several critical steps: capturing high-quality leaf images, extracting meaningful features from them, and then applying machine learning (ML) algorithms to classify the leaves into their respective species. The success of automated classification systems depends significantly on the accuracy and robustness of each step, especially the feature selection and the choice of machine learning models. Figure 1 demonstrates the general structure of plant leaf classification using ML algorithms.

Feature selection in plant leaf classification is a crucial process that determines the success of the whole system. Researchers extract various features from leaf images, broadly categorized into shape, color, texture, and venation features. Shape features, such as aspect ratio, circularity, and leaf margin structure, provide information about the geometric properties of the leaf. Color features capture the pigmentation patterns that can be characteristic of certain species, while texture features relate to the surface structure and patterns of the leaf. Venation features, or the patterns formed by the veins of a leaf, are particularly useful as they are highly distinctive across different species [2,3].

Recently, the focus has shifted toward more sophisticated feature selection methods that consider individual features and the relationships and combinations of features that best contribute to classification. Techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been employed to reduce the dimensionality of the feature space, selecting only the most relevant features for classification.

Machine learning methods have revolutionized how researchers approach plant leaf classification. The commonly adopted algorithms include Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Random Forests (RF), and various forms of Neural Networks (NN), including the now prevalent Deep Learning models like Convolutional Neural Networks (CNNs). These methods vary in complexity, interpretability, and computational requirements, and the choice often depends on the size and nature of the dataset available for training and validation [4].

Deep learning has shown great promise in the field due to its ability to learn hierarchical feature representations from the data. This often surpasses the performance of traditional machine learning methods that rely on handcrafted features. CNNs have been especially influential because they can directly process raw image data and automatically extract relevant features without significant human intervention.

Despite these advancements, automated plant leaf classification faces several challenges. One of the main issues is the high intra-class variability and inter-class similarity found in leaf features, which can complicate the classification process. For example, leaves from different species may appear very similar, or conversely, leaves from the same species may look different due to variations in environmental conditions, age, and other factors [5].

Additionally, obtaining large datasets of leaf images that are sufficiently diverse and labeled with high accuracy is a non-trivial task. These datasets are crucial for training and testing machine learning models but require significant time and effort to compile.

Other practical challenges include dealing with the varying quality of leaf images due to differences in lighting, orientation, occlusion, and background clutter. This variability can introduce noise into the system, reducing the accuracy of feature extraction and classification. Moreover, the computational complexity of processing large image datasets with advanced machine learning methods can be prohibitive, requiring substantial computational resources.

Furthermore, models that can generalize well to unseen data are needed, as the ultimate test of an automated classification system is its performance in real-world conditions, where ideal image capture is not always possible. Developing models robust to variations in data quality and capable of being deployed in various environmental and technological contexts remains a significant hurdle.

Our work uses three types of datasets from the Cope Leaf Dataset [6] for leaf classification. Enhancing classification accuracy, particularly with large datasets and numerous classes, requires a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. To improve the accuracy of data with multiple classes, we devised several feature selection, preprocessing, and training methods. We address a complex classification problem involving 99 classes from the Cope Leaf Dataset, tackling challenges such as class imbalance, overfitting, and feature complexity. Our approach to resolving these issues enhances classification accuracy, an aspect that has been underexplored in prior studies. Contributions to this work are listed as follows:

A complex classification problem is addressed involving 99 classes from the Cope Leaf Dataset, tackling challenges such as class imbalance, overfitting, and feature complexity. Our approach to resolving these issues enhances classification accuracy, an aspect that has been underexplored in prior studies.
Although preprocessing techniques like normalization, imputation, and noise reduction are well established, our systematic integration of these methods into a cohesive workflow tailored for a large multi-class dataset significantly improves model performance, as evidenced by the accuracy gains in our experimental results.
By comparing multiple machine learning (ML) algorithms, including the Hoeffding Tree, we provide new insights into their performance on high-dimensional, imbalanced multi-class datasets. The superior performance of the Hoeffding Tree over models like Naïve Bayes offers valuable contributions to understanding which models are more effective for complex classification challenges.

This paper is structured as follows: First, the introduction and literature review are presented in Section 1. Section 2 provides a comprehensive understanding of the topic. Section 3 presents feature extraction and selection. Section 4 introduces plant leaf classification using machine learning algorithms to delve deeper into the subject. Section 5 outlines the preprocessing and classification methods used in this study, followed by the presentation of the experimental results in Section 6. Lastly, Section 7 and Section 8 are dedicated to discussion, conclusion, and future recommendations.

2. Literature Review

Many studies in agriculture have deployed machine learning algorithms to improve the quality of plant growth and yields. These algorithms have allowed the authors to classify plants and early diagnose diseases by extracting features from the leaves with high accuracy.

2.1. Feature Extraction

Multiple studies are being exploited, focusing on applying machine learning to extract leaf features. Wu [7] and Chaki [8] both emphasize the importance of stable and discriminative features, with Chaki further focusing on texture and shape features, leading to an effective recognition of leaves with varying texture, shape, size, and orientations to an acceptable degree. The authors [9] review the performance of various moment invariant techniques for a feature extraction process, concluding that a 100% classification rate can be achieved by using Total Mutual Information (TMI) and a General Regression Neural Network (GRNN). Another study [10] builds upon this to classify images of diseased leaves using color and shape attributes and several ML algorithms, including the Support Vector Machine, the K-Nearest Neighbor, and the Decision trees. The authors in [11] used image processing techniques to recognize plant species by identifying appropriate features for extraction and improving classification accuracy. They utilized feature extractions such as Contour-based and Region-based to achieve more precise results. Other studies [12,13,14,15,16,17] use different implementations of deep learning algorithms to extract features of leaf plants, leading to classification or disease detection. The achieved accuracy is between 94.8% and over 99%, making the deep learning algorithms the best tool for feature extraction.

2.2. Plant Leaf Classification

Research on plant leaf classifications with machine learning has shown promising results. Many authors have developed new models and algorithms to improve the level of accuracy in plant leaf classification. Araujo [18] presents a multiple classifier system for plant leaf recognition that combines different classifiers trained on shape and texture features, improving identification performance by up to 28% compared to a monolithic approach and showing favorable comparisons with the best results reported in the literature for the datasets used. Later, Kala [19] proposed a feature extraction technique based on sinuosity coefficients, which achieved high accuracy in plant species classification. Ali et al. [20] presented a simple leaf recognition method that is computationally efficient and highly efficient for plant recognition. The proposed method was demonstrated on a publicly available leaf image database to demonstrate its usefulness and efficiency. Zhang [21] designed a modified local discriminant projection (MLDP) algorithm that aims to extract discriminant features for the designed plant leaves by preserving the local geometrical structure of leaves, and it showed promising results in an experimental study. The performance of the MLDP method is tested on the public ICL leaf image database, and its performance validates its effectiveness and feasibility. Dudi et al. [22] developed a new plant classification model based on enhanced segmentation and optimal feature selection. They created a hybrid algorithm C-EFO for the optimal feature selection and finalized the classification through the Enhanced Recurrent Neural Network (E-RNN) model. Their experiment was conducted in two plant leaf datasets and showed effectiveness and feasibility compared to the conventional models. The authors of [23] proposed a model that can classify up to 79 different plant species in India using the Convolution Neural Network DenseNet-161 model architecture with a testing accuracy of 97.3%. The application works on any Android platform and can classify the input plant image with an average latency of 1.98 s. In [24], the authors applied three technologies to achieve a model with high accuracy for plant classification. A Conditional Generative Adversarial Network was used to generate synthetic data, a Convolutional Neural Network was used for feature extraction, and a logistic regression classifier was used for efficient classification of plant species. From the eight datasets used, they achieved an average accuracy of 96.1% and up to 99 to 100% for individual datasets.

Different authors have simultaneously emphasized the outperformance of deep learning among other classification algorithms. Ojha [25] and Aman [26] both achieved high accuracy in classifying ornamental plant and flower leaves, respectively, using various machine learning algorithms, with Multi-Layer Perceptron (MLP) performing the best and Kala [27] highlighting that Convolutional Neural Network (CNN) categorization surpasses traditional methods in accuracy. Similarly, the authors of [28] reviewed various techniques such as support vector machine, ResNet, K-nearest neighbor, VGG-16, and others for leaf classification. They discussed the differences between machine learning and deep learning regarding the plant leaf classification problem and concluded that deep learning models achieve maximum accuracy. Parate et al. [29] also used machine learning to identify leaves of mangoes, oranges, and guavas through Google Teachable with high accuracy.

2.3. Leaf Disease Identification

A range of machine learning techniques have been explored to identify leaf diseases. Kethineni [30] highlights the use of image processing and ML algorithms, achieving high accuracy with a combination of k-Means, Firefly optimization Algorithm (FA), and Support Vector Machine (SVM). These algorithms were exploited to identify plant diseases, emphasizing the importance of plant health in agriculture and global warming. The new proposed technique has an achieved accuracy of 91.3%. The authors of [31] conducted a study to explore using various machine learning algorithms for leaf disease detection and classification, such as SVM, KNN, SGD, XGB, and random forest. Evaluation parameters like F1 score, recall, precision, and accuracy are considered to evaluate the performance of the algorithms. The experimental results show that the achieved accuracy varies between 66.66% for KNN and up to 83.33% for the SGD algorithm. In [32], the authors aimed to detect leaf diseases in rice through a new CNN model combining MobileNetV2, DenseNet121, and NASNetMobile. The proposed model can improve the accuracy of classification performance to 97.5% when compared to the performance of three standard CNN models.

Similarly, a study [33] was conducted to classify tomato leaf disease. The authors considered implementing CNN algorithms to achieve an accuracy of 99.89%. A system that observes the crops’ growth and leaf diseases continuously to advise farmers in need is proposed by the authors of [34]. The proposed framework produces efficient crop condition notifications to terminal IoT components that assist in irrigation, nutrition planning, and environmental compliance related to farming lands. SVM and CNN are used to train the IoT infrastructures to detect the various types of leaf diseases.

In [35], the authors propose a modern generic approach for wheat disease classification using Decision Trees and deep learning models. The refined models improved the decision tree algorithm accuracy by 28.5% and CNN accuracy by 4.3%. This knowledge-based system can help farmers apply appropriate management methods. The authors of [36] tried to solve the problem of Rumex obtusifolius Linnaeus by introducing a hybrid CNN model involving Visual Graphics Group-16 for well-separated leaves, ResNet-50 for complex issues, and Inception-v3 for illumination problems. The model was tested on two benchmark datasets. This model achieved high accuracy rates of 97.51%, 97.4%, 94.45%, and 95.9% on accuracy, recall, precision, and F1-score, respectively. A Machine Learning Framework for the classification of Date Palm Disease was developed by [37]. The framework uses 80 GLCM texture features and 9 HSV color moment features from leaflets. Two types of ML algorithms, classic SVM and KNN, and ensemble learning methods, RF and LightGBM, were tested. The SVM classifier outperformed the combined GLCM and HSV features, achieving an accuracy of 98.29%. The authors of [38] proposed a vision-based automatic medicinal plant identification system using neural network techniques and deep learning. A novel DeepHerb dataset, consisting of 2515 leaf images from 40 Indian herbs, is used to identify plants. The model deploys ANN and SVM to extract features and classify the herbs. The proposed DeepHerb model learned from Xception, and ANN outperformed pre-trained models by 97.5% accuracy. Another study [39] shows that the ANN algorithm and its variants, CNN, are highly effective in classifying tasks for monitoring paddy rice diseases. Potato plant disease detection is studied in [40,41] using CNN networks, and it gives satisfactory results. Ref. [42] includes a review of 17 studies related to banana plant disease identification, ranging with an accuracy between 80% and 99.61%. These studies collectively underscore the potential of ML in this domain, with CNN, SVM, and Random Forest emerging as particularly promising approaches. In recent years, a novel approach has been exploited in leaf disease identification using data augmentation through Generative Adversarial Networks (GAN), providing promising results in improving the performance of traditional ML algorithms. The tomato leaf identification technique is proposed by [43,44]. Other types of plant leaf disease have been studied by [44,45,46]. In contrast, our study investigates how state-of-the-art algorithms perform as the number of classes increases, offering a unique perspective on scalability and model behavior. While previous studies provide valuable insights, they do not explore the effects of scaling up the number of classes to the extent we do, making direct comparisons less meaningful in this context. Nonetheless, we are committed to providing a robust analysis of the performance of the models used in this study and how they fare with an increased number of classes. Table 1 reflects a comparison of our study with current models:

These studies collectively highlight the potential of machine learning for accurate and efficient leaf feature extraction, classification, and disease detection.

3. Feature Extraction and Selection

Machine learning algorithms require relevant and important input features to predict the output. However, not all features are equally applicable in a prediction task, and some may even generate noise in the model. Feature selection and feature extraction are two approaches to handle this. Feature selection is picking a subset of relevant features from the initial set of features. The goal is to minimize the dimensionality of the feature space, simplify the model, and increase its generalization performance. In addition, feature selection enables the proposed algorithm to train faster, reduce computational complexity, and minimize the overfitting of a model [51]. On the other hand, feature extraction is changing the original features into a new collection of more informative and compact features. The idea is to extract the most essential information from the original features and represent it in a lower-dimensional feature space.

Combining two or more feature extraction techniques (shape, texture, color, venation, etc.) gives better classification results than a single-feature extraction technique [51,52]. Figure 2 illustrates feature extraction and selection in machine learning with six phases. In phase 1, after raw data are collected from different sources, they will be preprocessed to select the target data we need to study. Afterward, feature extraction and selection processes are utilized in Phases 2 and 3 to extract and detect the most essential features. The preprocessing step is given in phase 4. This step is considered the second step in building a classification model. The collected data need to be preprocessed to ensure its quality. This involves handling missing values, dealing with outliers, and transforming the data into a format suitable for analysis. After we extract the best features of the leaf, we conduct a classification in phase 5 based on a trained classification algorithm (i.e., Convolutional Neural Network (CNN), Nearest Neighbor (KNN), Support Vector Machine (SVM), or Principal Components Analysis (PCA), etc. Using data mining techniques, the data are transformed into a structure that is suitable for interpretation and evaluation of the results in phase 6.

Plants have medical properties and provide food and oxygen. Therefore, the research community is very interested in this area, especially in plant classification and identification through the leaves. Various research studies identify plant properties from flowers or other parts; however, the leaf is considered the most reliable source of information since it is available all year round [53].

Identification of plants using their leaves is becoming increasingly fascinating and trendy. Every leaf contains specific details that aid in the recognition of different types of plants. Many authors have studied this issue from different perspectives [54,55]. For example, the authors in [54] seek to examine and assess the execution and effectiveness of various approaches for categorizing plants. Every method has its pros and cons when it comes to identifying leaf patterns. Leaf image quality is crucial; thus, a dependable leaf database is necessary to set up the machine learning algorithm for leaf recognition and validation. The study in [56] suggests utilizing two features, namely, shape, and texture, as part of the proposed method. The method based on the shape will capture the outline characteristics of each leaf and subsequently evaluate the differences between them utilizing the Jeffrey-divergence metric. The patterns of edge gradients will be used to examine the overall texture of the leaf. Then, an incremental classification algorithm will combine the outcomes of these approaches.

On the other hand, in [56], the authors stressed that leaf images must be pre-processed for plant identification using leaves to extract essential features. The authors in [56] introduce plant identification through leaf characteristics with Multiclass SVM (MSVM) as the classifier. Through the provided analysis, the authors claim that a high recognition rate of 90% is received by using different datasets.

In this work, feature extraction aims to reduce data complexity while retaining as much relevant information as possible. This helps to improve the performance and efficiency of machine learning algorithms and simplify the analysis process. Feature extraction may involve the creation of new features and data manipulation to separate and simplify the use of meaningful features from irrelevant ones. The separation of the leaf object (foreground) from its background is referred to as segmentation. This procedure uses the adaptive threshold K-Means method [28]. Following segmentation, geometric features are taken from the segmented image. For example, the aspect ratio and roundness (R) of the leaf can be determined using the following very know formula:

R = \frac{4 \times π \times A}{p^{2}}

(1)

The color of the leaves is considered a morphological characteristic. Several statistical characteristics, such as mean, skewness, and kurtosis, can be calculated in the color space to characterize leaf color attributes. This approach has low computer complexity and is suitable for real-time processing. Because the processing image contains three color planes (red, green, and blue), the mean of the three-color planes after the leaf contour can be used to estimate Ch and N, as shown below:

C h N = G - (\frac{R + B}{2})

(2)

The values of red and blue are provided for normalization and to estimate the ChN of healthy and dead leaves. Nitrogen can be calculated using equation:

N = \frac{1}{3} (\frac{H U - 60}{60} + (1 - S A) + (1 - B G))

(3)

where HU, SA, and BG represent the hue value, saturation intensity, and brightness intensity of the colored image, respectively.

The maximum distance between two spots on the leaf object’s boundary in the processing image is referred to as effective diameter (ED). The effective diameter can be estimated using another morphological property of a leaf known as area, as follows:

E D = \sqrt{\frac{A r e a}{π}}

(4)

4. Machine Learning for Plant Leaf Classification

Plant leaf classification through machine learning is a rapidly developing study area with significant challenges and promising opportunities. The application of machine learning has allowed researchers to address complex problems previously considered impossible by conventional computational methods. Using advanced computational models, plant species classification is now being redefined through the lens of agricultural practices, environmental conservation, and medicinal research [57].

Researchers have been motivated to develop machine learning techniques to classify leaves to prevent plant species’ extinction. This ensures that plant biodiversity is better understood and utilized. Furthermore, it serves multiple objectives, from conserving ecosystems to identifying plant species with potential undiscovered medicinal properties [57].

However, classifying plant leaves is complex due to the diverse morphologies that appear. Various factors, such as their geographical location and seasonal changes, can affect the size, shape, and color of these plants. These variations introduce difficulties in developing a universal classification system [58]. In addition, the sheer number of plant species adds an additional layer of complexity that requires sophisticated, adaptable, and highly nuanced machine learning algorithms [57].

To meet these challenges, several machine learning models have been widely implemented, such as Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and K-Nearest Neighbors (K-NN). The application of each model to the problem of plant leaf classification presents different advantages [58,59].

CNNs have become dominant because they have been found to learn features in an automatic, ‘end-to-end’ way, directly from images without manual feature extraction. CNNs use hierarchical layers to capture fine details such as leaf shape, venation, and texture. One reason CNNs are so effective at processing leaf images is that this complex visual input requires the ability to learn representations that capture small morphological details. CNNs also excel in tasks where intra-class variability (differences within the same species) is high, as their deep learning structure allows for greater flexibility in identifying variations in patterns like leaf vein arrangement or surface texture [60,61].

On the other hand, SVMs tend to be used when the data are represented with a manually extracted set of features. In high-dimensional spaces, where the decision boundary between classes has to be optimized, these models are quite effective [60,61]. Features such as leaf perimeter, aspect ratio, and texture features extracted from the Gray Level Co-occurrence Matrix (GLCM) [59] are used by SVMs. Manually extracted features of the leaf give a structured view of the leaf, on which SVMs learn the optimal hyperplane to separate the feature space and thus classify species. In this case, SVMs have been shown to perform well until the features become inseparable due to complexity.

As with CNNs or SVMs, the K-NN algorithm has also been used for plant leaf classification. K-NN finds the new leaf sample, compares it with all stored examples, and then classifies it according to the majority class of its nearest neighbors. This model is strong for class separation and clear geometric features learned from the feature space. Unfortunately, performance can be jeopardized by high-dimensional data or overlapping classes. Because K-NN is often inefficient and inaccurate in high-dimensional feature spaces, it is often enhanced with dimensionality reduction techniques such as Principal Component Analysis (PCA) [57,62].

The success of these models largely depends on feature extraction. The literature has explored the shape, texture, and venation feature extraction of plant leaves. Differentiation among species based on overall structure, as well as shape features such as perimeter, convexity, and aspect ratio, are important in identifying species. GLCM is often employed to extract texture features of surface characteristics that are not readily apparent from the leaf’s shape alone. Additionally, edge detection techniques can be used to extract venation features, which give a detailed map of the internal structure of the leaf, adding discriminative power when similarly shaped species differ with respect to their internal vein structure [60,63].

Machine learning algorithms have proven valuable in overcoming these challenges. Multilayer Perceptron (MLP), Support Vector Machine (SVM), and other models have been employed and evaluated based on specific performance metrics [31,64].

Machine learning has been significantly accelerated by advancements in deep learning, especially with the enhancement of neural network frameworks such as Convolutional Neural Networks (CNNs). CNNs have gained popularity due to their inherent capability of processing and deciphering complex imagery, particularly plants. Due to their sophisticated image recognition capabilities, CNNs have demonstrated robust performance in a wide range of plant classification tasks, thereby establishing new standards for accuracy and reliability.

As illustrated in Figure 3, the workflow of leaf classification commences with image preprocessing to standardize and improve the quality of the input images. This step ensures that lighting, scale, and orientation variations do not affect the subsequent feature extraction process. After preprocessing, feature extraction is performed to capture the intrinsic characteristics of the leaves, such as texture, shape, and color features. Once these features are extracted, feature selection is employed to identify and retain the most significant features that contribute to differentiating between leaf species, thereby reducing the complexity of the model. The selected features are then fed into a machine learning classifier trained to recognize and classify the various leaf types. The outcome of this process is a set of results that reflect the classifier’s accuracy in leaf classification.

It is critically important to realize that the effectiveness of machine learning models in plant leaf classification is heavily dependent on image processing and feature extraction techniques. Techniques like the Gray Level Co-occurrence Matrix (GLCM) play a significant role in mining out critical features such as leaf texture, which are vital for classifying leaves. These characteristics form the basis for machine learning models to distinguish between species [51].

In addition to these models, commonly used performance metrics to evaluate plant leaf classification are accuracy, precision, recall, and F1-score. These metrics allow us to assess how well a model separates species accounts for misclassifications, and finds the right balance between precision and recall. The accuracy of CNNs in complex classification tasks has consistently been shown to be the best of the three compared, as they can identify fine details in images. SVMs generally perform well with structured, high-dimensional features, and K-NN provides a simple, interpretable baseline for comparison.

We applied these machine learning techniques to classify plant leaves in our study. Based on the unique challenges posed by our dataset, we addressed them by automatically extracting features using CNNs, and manually extracting features using SVMs and K-NN, achieving high classification accuracy. In our approach, we blended shape, texture, and venation features to form a single comprehensive model, which is robust against variation in plant morphology.

Feature extraction techniques are crucial for the success of the machine learning process. They stand as a cornerstone technique, drawing upon the shape, texture, and venation of leaves as key discriminative attributes. The shape, encompassing the leaf’s outline, tip, base, margins, and vein arrangement, is discerned to classify plant species distinctly [65]. Concurrently, geometric features such as the leaf’s length, width, area, and perimeter are quantifiable data points during extraction [63]. Texture captures the surface patterns, and venation analysis, studying the leaf’s vein patterns, is pivotal in recognizing and differentiating various plant types [65,66]. Notably, the adoption of both region-based shape descriptors (RBSD) and contour-based shape descriptors (CBSD) in feature extraction—where RBSD extracts shape features from within the leaf and CBSD from the leaf’s edge—can significantly bolster the accuracy of classification systems when used in tandem [67]. These multifaceted feature extraction methods, rooted in robust image processing techniques, empower machine learning algorithms to navigate the complex domain of plant taxonomy effectively.

Each of these techniques contributes to creating a comprehensive feature set that machine learning models such as K-Nearest Neighbors (K-NN), Support Vector Machines, and Random Forest can exploit to classify plant leaves accurately [64,68]. The performance of these techniques can be evaluated using metrics such as accuracy, precision, recall, and F1 score [68].

Despite these advancements, plant leaf classification using machine learning has its challenges. Variability in leaf characteristics and the vast diversity of plant species necessitate a continuous effort to develop more advanced image processing and feature extraction techniques. The objective is to construct machine learning models that are accurate and robust against the variations presented by nature [31,51,57,58].

To surmount these challenges, various strategies have been proposed. One approach includes developing methods to account for the variability in leaf characteristics due to environmental and geographical factors. Another involves improving the training data by including a broader spectrum of leaf conditions, such as addressing the imbalance caused by excluding damaged or diseased leaves from datasets. Deep learning techniques have been pivotal in this regard, offering a means to enhance the accuracy of plant leaf classification by using sophisticated image processing methods for feature detection and extraction [58,67,69,70].

5. Proposed Methodology

Classifying many classes introduces specific challenges in machine learning, such as increased computational complexity, greater risk of overfitting, and difficulty in managing class imbalance. In this work, we aim to improve the classification accuracy of the Cope Leaf Dataset [6], which has 1584 images with 99 labels. The Naïve Bayes classifier gives the highest accuracy (85.18%) without a proposed methodology. We developed several feature selection, preprocessing, and training methods to increase data accuracy with many classes. Increasing accuracy in classification, especially when dealing with large datasets and many classes, involves a combination of data preprocessing, model selection, regularization techniques, and fine-tuning. The following steps are applied to enhance the accuracy of the classification of the Cope Leaf Dataset for 99 label data:

Noisy and mislabeled data are removed.
Data augmentation techniques (rotation and shifting) are applied to increase the diversity of the training set.
The dataset is balanced according to the number of records on each label.
Data are shuffled to ensure that the distribution is random, which helps in reducing bias during training.
Most relevant features are selected using correlation analysis.
New features are created from existing data via statistical methods such as min, max, range, etc.
Data are converted into a new structure in a consistent format.

Classifying datasets with many classes (large label classification) requires effective strategies to manage high-dimensional output spaces. Feature extraction, feature selection, and data preprocessing are crucial steps to ensure that classification algorithms can work effectively [71,72]. The next step selects data for training and testing, and several classification algorithms are applied. Bayes Net (BN), Naïve Bayes Classifier, Multilayer Perception (MP), Hoeffding Tree (HT), J48, Random Forest (RF), and Convolutional Neural Networks (CNN) algorithms were applied to classify 99 label leaf datasets. Naive Bayes classifiers are simple probabilistic models that assume independence between features given the class. This simplicity makes them computationally efficient and scalable, even when dealing with many classes [73].

On the other hand, Decision Trees tend to overfit the training data, especially when the number of classes is large. Both probabilistic and rule-based algorithms are applied in several categories. Classification with 2-class, 5-class, and 10-class has very high accuracy. After structuring the form and adding new features, the Naïve Bayes Classifier gives 89.63%, Multilayer Perception gives 89.48%, and the Hoeffding Tree gives 89.92% accuracy, promising for the 99-label class dataset. In this study, 80% of the data is allocated to the training set for developing the machine learning models, while the remaining 20% is reserved as the test set to assess the models’ performance. MP model has three hidden layers of 128, 64, and 32 neurons with ReLU activation. The model was trained using the Adam optimizer and categorical cross-entropy loss for multi-class classification. The architecture of the MP was chosen to be complex enough to capture the non-linear relationship between the leaf features without overfitting.

In this study, we initially employed an 80/20 split of the data for training and testing, which provided us with a basic understanding of the model’s performance. However, to ensure the robustness and generalization ability of our models, we additionally conducted k-fold cross-validation (with k = 5 or 10). This allowed us to assess performance across multiple subsets of the data and confirmed that the models perform consistently and generalize well across different training and testing splits. Moreover, stratified cross-validation was used to maintain the class balance across all folds, which is particularly important in our 99-class dataset, where class imbalance could skew performance. The results showed stable accuracy and precision across folds, further confirming the model’s stability.

In many real-world scenarios, datasets with many classes often exhibit class imbalance. The HT algorithm can effectively handle skewed class distributions by continuously updating based on the incoming stream and focusing on the most relevant features. The HT algorithm gives the best classification results in this research work with the NBC and MP algorithms. Algorithm 1 demonstrates the pseudocode of the HT algorithm below.

Algorithm 1. Hoeffding Tree Algorithm
1:	Input: S: Stream of instances; P: Desired probability; T: Tie-breaking threshold
2:	Initialize an empty Hoeffding Tree (HT) with a single leaf
3:	for each instance X in S do
4:		Traverse HT to find the appropriate leaf L for X
5:		Update statistics at leaf L with X
6:		if the number of instances at L reaches MIN then
7:			Compute Information Gain (G) for each attribute
8:			Let X_a be the attribute with the highest G and X_b be the attribute with the second highest G
9:			Compute desired probability (P) using the Hoeffding bound
10:			if (X_a − X_b > P) or (P < T) then
11:				Split L using the best attribute X_a
12:				Create new leaves for each branch of the split
13:				Distribute the instances of L among the new leaves
14:			else
15:				Continue without splitting
16:			end if
17:		end if
18:	end for
19:	return the trained Hoeffding Tree (HT)

The Hoeffding Tree (HT) algorithm was the most effective in our study with an accuracy of 89.92%, and this was due to its incremental learning approach and its capability to deal with high dimensionality and imbalance data. The dynamic construction of the decision tree and the selection of features in HT also eliminated overfitting and achieved high accuracy for such features as margin, texture, and shape. Naïve Bayes Classifier (NBC) was the second best with an accuracy of 89.63%. While its probabilistic model is fast, the independent assumption of features somewhat constrained the model from capturing higher-order interactions, thus slightly lagging HT. Multilayer Perception (MLP) had a classification accuracy of 89.48%. Despite its capability to model non-linear relationships between the features of the leaves, the algorithm had a high tendency of overfitting especially with the class imbalance.

However, the J48 algorithm gave the lowest accuracy of 57.87%. The static decision tree construction led to its overfitting, and the problem of handling imbalanced classes also led to its poor performance. This was because J48 could not generalize well in this complex, multi-class dataset, as observed earlier, which is the case with our dataset.

6. Experimental Results and Discussions

Plant classification using machine learning algorithms involves training algorithms to identify and categorize different species of plants based on features extracted from images, such as leaf shapes, textures, colors, and other morphological characteristics. This work applies the Cope Leaf Dataset to several feature selection and ML algorithms to achieve a suitable model to classify plant leaves. The Cope Leaf Dataset is a popular dataset used for leaf classification tasks in machine learning. It consists of images of leaves from various plant species, along with corresponding labels indicating the species of each leaf. Like other leaf classification datasets, the Cope leaf dataset may present challenges such as class imbalance, variations in image quality, and intra-class variability due to factors like leaf aging and environmental conditions. We aim to address these challenges by building robust classification models. The dataset consists of 1584 images with 99 labels. Each record has 64 attributes, including each label’s characteristics for margin, shape, and texture [6].

In this work, both probabilistic and rule-based machine learning algorithms are applied to the raw dataset. Table 2 shows the accuracy and statistical analysis of classification with the plant leaf margin feature set. The mean absolute error (MAE), Root Mean Squared Error (RMSE), and Relative Absolute Error (RAE) are given in Table 2. Table 3 introduces the texture feature set, which results similarly to the shape feature set. Results show that classification using a margin feature set creates a more suitable model plant leaf dataset. The volume of the dataset and the number of classes (99) make the classification problem very challenging. The Naive Bayes classifier gives the highest accuracy for the raw data. Naive Bayes classifiers perform well because they make strong and naive independence assumptions about the features. It provides straightforward explanations of the classification decision based on the probabilities of different classes given the input features [74,75,76]. Figure 4 demonstrates the accuracy of ML algorithms with the classification of margin, texture, and all feature datasets.

Accuracy helps assess the quality and effectiveness of the model in solving the classification task. A high level of accuracy indicates that the model has learned meaningful patterns and relationships in the data, leading to accurate predictions. On the other hand, low accuracy may indicate issues such as underfitting, overfitting, or inadequate feature representation. In this work, several feature selection and machine learning algorithms are proposed to increase the accuracy of the classification. In this part of the experiments, we move from a classification task with a small number of classes (2 classes) to one with a high number (99 classes) of classes. Classification analysis of classes in the dataset with 2, 5, 10, 25, 50, and 75 is demonstrated in Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9. The classification problem is typically less complex with fewer classes, and the decision boundaries between classes may be more distinct. Simple models like Logistic Regression or Naive Bayes classifiers may perform well on such tasks due to their ability to capture linear or simple relationships between features and classes. However, as the number of classes increases, the problem becomes more complex, requiring more sophisticated models like neural networks or ensemble methods to accurately capture the intricate relationships between features and classes. Additionally, addressing challenges like class imbalance, data sparsity, and feature relevance becomes increasingly important as the number of classes in the classification task grows. The classification accuracy with a 2-class, 5-class, and 10-class problem is 100% with the Naïve Bayes classifier and more than 96% with the multilayer perception algorithm.

The mean absolute error (MAE) calculates the average of the absolute differences between predicted and actual values. Root Mean Squared Error (RMSE) is a frequently used measure of the differences between values predicted by a model, or an estimator and the actual values observed. RMSE is particularly useful in machine learning-based classification modeling to evaluate the performance of a model. Relative Absolute Error (RAE) measures the total absolute error in prediction relative to the total absolute error of a simple baseline model, usually the mean of the observed data. Precision is the ratio of true positive predictions to all positive predictions. It tells us how many of the positive predictions made by the model are correct. Another metric recall, also known as sensitivity or true-positive rate, is the ratio of true-positive predictions to all actual positive cases. F1 metric provides a single metric that balances both precision and recall. All these metrics demonstrate the absolute and relative performance of the plant leaf classification model.

True-positive (TP) and false-positive (FP) metrics are critical metrics to measure the success of the developed models in machine learning. Because they offer detailed insights into the performance of classification models, they enable the calculation of essential evaluation metrics, help understand model bias and behavior, support cost-sensitive decision-making, and aid in fine-tuning and comparing models for optimal performance. The average values of TP and FP after classification are given in the tables. Our plant leaf classification methods have high TP and low FP values, indicating that the developed models perform well.

On the other hand, accuracy is lower in 25-class, 50-class, and 75-class problems. NBC gives 96.21% accuracy with the 25-class classification, and MP gives 94.70% accuracy. Tables demonstrate other statistical evaluation measurements, such as MAE, RMSE, RAE, TP, and FP ratios.

Figure 5 shows the accuracy of classification for 2, 5, 10, 25, 50, and 75 labels. Figure 6 demonstrates MAE, RAE, RMSE, and TP rates for the same number of labels in the leaf classification problem [77]. Several feature selection and preprocessing methods are applied to increase plant leaf classification accuracy, such as data cleaning, transformation, feature engineering, and splitting the data into appropriate sets. Experiments show in Table 10 that MP algorithms give 89.48% accuracy, NBC gives 89.63% accuracy, and the HT algorithm gives 89.92% accuracy for the classification of the 99 label problems. Fewer classes make classification easier and more accurate from a theoretical standpoint, but in practice, algorithms J48 can behave differently depending on the nature of the data and specific model hyperparameters. Therefore, J48 has lower accuracy than other classification algorithms in this dataset. CNN achieved 88.72% accuracy, slightly lower than Hoeffding Tree (89.92%), but higher than other models like Random Forest (86.81%).

Figure 7 shows examples of plant leaves that were accurately classified by the proposed model used in this study. These samples highlight the model’s ability to correctly distinguish plant species based on distinguishing features such as shape, texture, and venation patterns, even within a multi-class classification setup. Figure 8 displays examples of plant leaves that were misclassified by the models, illustrating the challenges faced in classification. Misclassification often arose from factors such as high intra-class variability, inter-class similarity, and image quality issues, which can obscure unique features critical for accurate classification [6].

Table 11 compares the proposed model with several state-of-the-art algorithms such as the Support vector machine (SVD), ResNet-50, DenseNet-121, random forest, and CNN. The proposed model outperforms other state-of-the-art models in accuracy, F1 score, and recall, demonstrating strong classification capabilities for large, complex plant datasets. CNN and SVM are also competitive, particularly in precision and overall error rates.

7. Discussion

During the classification process, some algorithms struggled to accurately identify certain plant leaf samples due to various factors. These included class imbalance, high variability within the same class, similarities between different classes, complexity and redundancy in features, overfitting, and inconsistencies in image quality. A key challenge is class imbalance, where some species are overrepresented while others have very few samples. Algorithms like Naive Bayes or Multilayer Perceptron may have difficulty with underrepresented classes, leading to misclassifications, especially for rare species. Hoeffding Trees, which adaptively learn, manage imbalance better but still face difficulties in extreme cases. For example, if one species has 200 samples and another has only 5, the algorithm tends to favor the more common species, making it harder to correctly classify the rarer ones.

Additionally, leaves from the same species may appear different due to factors like aging, lighting, or the season in which they were collected, while leaves from different species might look very similar. This creates challenges for the model. For instance, two species may have similar textures but different shapes, and if the algorithm focuses too much on texture, it may misclassify the species. Conversely, young and old leaves of the same species can look quite different, causing issues with intra-class variability. Even though advanced techniques like PCA and LDA were used to reduce feature dimensionality, some irrelevant features may remain, adding noise to the model and complicating classification. For example, models like Random Forest may struggle to differentiate between species if certain features, like shape, dominate the classification process when texture should also be considered.

One of the reasons why the Hoeffding Tree algorithm outperforms others, such as J48, is its ability to handle high-dimensional, imbalanced datasets effectively. The Hoeffding Tree’s incremental learning approach allows it to adapt dynamically to the data as they are streamed, making it particularly well-suited for datasets with a large number of classes, such as the 99-class Cope Leaf Dataset. It avoids overfitting by only making statistically significant splits, ensuring that the model generalizes well to unseen data. On the other hand, J48 constructs a static decision tree based on the training data, which often leads to overfitting, especially in complex, multi-class problems. The algorithm tends to create overly complex models that do not perform well on new data, especially when class imbalance is present. This, combined with its tendency to favor majority classes, results in its lower accuracy compared to the Hoeffding Tree in this study.

Complex models like Multilayer Perceptron may overfit the training data, performing well on known samples but poorly on new ones. This happens when the model learns irrelevant details from the training data, failing to generalize to new leaf images. For instance, if MLP classifies based on specific edge details that are not consistent across all samples, it will struggle with slightly different images. Variations in image quality, such as lighting, focus, or background clutter, can introduce noise that affects feature extraction. Some algorithms, like Naive Bayes, which rely on probabilistic methods, are more vulnerable to this noise, while others, like Random Forest, handle it better. For example, a blurry or poorly lit image may lose key details like leaf venation, leading to misclassification. Similarly, if the margins of the leaf are not clearly visible, the model may incorrectly classify based on shape.

8. Conclusions

In this study, we addressed the complex problem of plant classification using the Cope Leaf Dataset, which comprises 1584 images of leaves from 99 different species, each characterized by 64 attributes related to margin, shape, and texture. We aimed to build robust classification models to accurately identify plant species based on these leaf characteristics despite challenges such as class imbalance, variations in image quality, and intra-class variability due to environmental factors and leaf aging.

For the largest classification task involving 99 classes, we observed that preprocessing steps, including feature selection and data transformation, significantly enhanced model performance. After these steps, the Naive Bayes Classifier achieved an accuracy of 89.63%, Convolutional Neural Networks has 88.72%, while the Hoeffding Tree algorithm reached 89.92%, underscoring the effectiveness of these enhancements.

In conclusion, our study demonstrates the potential of machine learning algorithms, particularly the Hoeffding Tree algorithm, for classifying plant species using leaf images. The results highlight the importance of appropriate feature selection, preprocessing methods, and the choice of algorithm in tackling the complexities of large-scale plant classification tasks. Future work could further optimize these models, explore deep learning approaches, and expand the dataset to include more diverse plant species for broader applicability.

Author Contributions

E.E., A.E.T., E.C., A.I.Z., C.Z., A.S. and W.A. were involved in the whole process of producing this paper, including conceptualization, methodology, modeling, validation, visualization, and manuscript preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the Kaggle Dataset weblink provided at reference number [6].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Masters, E.T. Medicinal plants of the upper Aswa River catchment of northern Uganda—A cultural crossroads. J. Ethnobiology. Ethnomedicine 2023, 19, 48. [Google Scholar] [CrossRef] [PubMed]
Chougui, A.; Moussaoui, A.; Moussaoui, A. Plant-Leaf Diseases Classification using CNN, CBAM and Vision Transformer. In Proceedings of the 5th International Symposium on Informatics and Its Applications (ISIA), M’sila, Algeria, 29–30 November 2022; pp. 1–6. [Google Scholar]
Shanker, R.; Sharma, D.; Bhattacharya, M. Development of Plant-Leaf Disease Classification Model using Convolutional Neural Network. In Proceedings of the IEEE 4th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), Goa, India, 8–9 October 2022; pp. 434–438. [Google Scholar]
Dudi, V.R.; Kumar, G.P. Plant Leaf Classification through Deep Feature Fusion with Bidirectional Long Short-Term Memory. In Proceedings of the International Conference on Innovations in Science and Technology for Sustainable Development (ICISTSD), Kollam, India, 25–26 August 2022; pp. 68–73. [Google Scholar]
Hemanthkumar, K.A.; Bharathi, P.S. Improved Accuracy of Plant Leaf Classification using Random Forest Classifier over K-Nearest Neighbours. In Proceedings of the International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 15–16 July 2022. [Google Scholar]
Available online: https://www.kaggle.com/competitions/leaf-classification/data (accessed on 14 May 2024).
Wu, Q.; Zhou, C.; Wang, C. Feature Extraction and XML Representation of Plant Leaf for Image Retrieval. In Advanced Web and Network Technologies, and Applications; Shen, H.T., Li, J., Li, M., Ni, J., Wang, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 127–131. [Google Scholar]
Chaki, J.; Parekh, R.; Bhattacharya, S. Plant leaf recognition using texture and shape features with neural classifiers. Pattern Recognit. Lett. 2015, 58, 61–68. [Google Scholar] [CrossRef]
Zulkifli, Z.; Saad, P.; Mohtar, I.A. Plant leaf identification using moment invariants & General Regression Neural Network. In Proceedings of the 11th International Conference on Hybrid Intelligent Systems (HIS), Melacca, Malaysia, 5–8 December 2011; pp. 430–435. [Google Scholar]
Nandhini, N.; Bhavani, R. Feature Extraction for Diseased Leaf Image Classification using Machine Learning. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 22–24 January 2020; pp. 1–4. [Google Scholar]
Donesh, S.; Piumi Ishanka, U.A. Plant Leaf Recognition: Comparing Contour-Based and Region-Based Feature Extraction. In Proceedings of the 2nd International Conference on Advancements in Computing (ICAC), Malabe, Sri Lanka, 10–11 December 2020; pp. 369–373. [Google Scholar]
Hosny, K.M.; El-Hady, W.M.; Samy, F.M.; Vrochidou, E.; Papakostas, G.A. Multi-Class Classification of Plant Leaf Diseases Using Feature Fusion of Deep Convolutional Neural Network and Local Binary Pattern. IEEE Access 2023, 11, 62307–62317. [Google Scholar] [CrossRef]
Liu, K.; Zhang, X. PiTLiD: Identification of Plant Disease from Leaf Images Based on Convolutional Neural Network. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 1278–1288. [Google Scholar] [CrossRef] [PubMed]
Tan, J.W.; Chang, S.-W.; Abdul-Kareem, S.; Yap, H.J.; Yong, K.-T. Deep Learning for Plant Species Classification Using Leaf Vein Morphometric. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 82–90. [Google Scholar] [CrossRef]
Hu, J.; Chen, Z.; Yang, M.; Zhang, R.; Cui, Y. A Multiscale Fusion Convolutional Neural Network for Plant Leaf Recognition. IEEE Signal Process. Lett. 2018, 25, 853–857. [Google Scholar] [CrossRef]
Madhurya, C.; Jubilson, E.A. YR2S: Efficient Deep Learning Technique for Detecting and Classifying Plant Leaf Diseases. IEEE Access 2024, 12, 3790–3804. [Google Scholar] [CrossRef]
Zhang, X.; Mao, Y.; Yang, Q.; Zhang, X. A Plant Leaf Disease Image Classification Method Integrating Capsule Network and Residual Network. IEEE Access 2024, 12, 44573–44585. [Google Scholar] [CrossRef]
Araújo, V.; Britto, A.S.; Brun, A.L.; Koerich, A.L.; Palate, R. Multiple classifier system for plant leaf recognition. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1880–1885. [Google Scholar]
Kala, J.R.; Viriri, S. Plant specie classification using sinuosity coefficients of leaves. Image Anal. Stereol. 2018, 37, 119–126. [Google Scholar] [CrossRef]
Ali, R.; Hardie, R.; Essa, A. A Leaf Recognition Approach to Plant Classification Using Machine Learning. In Proceedings of the NAECON IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 431–434. [Google Scholar]
Zhang, S.; Huang, W.; Wang, Z. Plant species identification based on modified local discriminant projection. Neural Comput. Appl. 2020, 32, 16329–16336. [Google Scholar] [CrossRef]
Dudi, B.; Rajesh, V. A computer aided plant leaf classification based on optimal feature selection and enhanced recurrent neural network. J. Exp. Theor. Artif. Intell. 2023, 35, 1001–1035. [Google Scholar] [CrossRef]
Shelke, A.; Mehendale, N. A CNN-based android application for plant leaf classification at remote locations. Neural Comput. Appl. 2023, 35, 2601–2607. [Google Scholar] [CrossRef]
Kanda, P.S.; Xia, K.; Sanusi, O.H. A Deep Learning-Based Recognition Technique for Plant Leaf Classification. IEEE Access 2021, 9, 162590–162613. [Google Scholar] [CrossRef]
Ojha, A.; Kumar, V. Image Classification of Ornamental Plants Leaf using Machine Learning Algorithms. In Proceedings of the 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 21–23 September 2022; pp. 834–840. [Google Scholar]
Aman, B.K.; Kumar, V. Flower Leaf Image Classification using Machine Learning Techniques. In Proceedings of the Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, 11–12 August 2022; pp. 553–558. [Google Scholar]
Kala, S.N.; Padmaja, N.; Neelima, P. Flower Classification Using Deep Learning Approaches. In Proceedings of the International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 1–2 November 2023; pp. 1–7. [Google Scholar]
Kunjachan, S.; Kala, S. Approaches for Plant Leaf Classification: A Review. In Proceedings of the 4th International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India, 11–12 February 2023; pp. 1–5. [Google Scholar]
Parate, R.K.; Dhole, K.M.; Sharma, S.J. Classification of Leaf using Teachable Machine. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 307–311. [Google Scholar] [CrossRef]
Kethineni, K.; Pradeepini, G. Identification of Leaf Disease Using Machine Learning Algorithm for Improving the Agricultural System. J. Adv. Inf. Technol. 2023, 14, 122–129. [Google Scholar] [CrossRef]
Singh, S.; Roy, Y.; Bhan, A.; Sah, S. Computer based Detection and Classification of Leaf Diseases using Hybrid Features. In Proceedings of the International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 14–16 June 2023; pp. 788–793. [Google Scholar]
Al Hakim, M.F.; Prasetiyo, B. CNN-ML Stacking for better Classification of Rice Leaf Diseases. In Proceedings of the IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Bandung, Indonesia, 21–23 February 2024; pp. 1–5. [Google Scholar]
Soni, T.; Gupta, D.; Dutta, M. Optimized Deep Learning Architecture for Tomato Leaf Disease Classification. In Proceedings of the Annual International Conference on Emerging Research Areas: International Conference on Intelligent Systems (AICERA/ICIS), Kanjirapally, India, 16–18 November 2023; pp. 1–6. [Google Scholar]
Nagasubramanian, G.; Sakthivel, R.K.; Patan, R.; Sankayya, M.; Daneshmand, M.; Gandomi, A.H. Ensemble Classification and IoT-Based Pattern Recognition for Crop Disease Monitoring System. IEEE Internet Things J. 2021, 8, 12847–12854. [Google Scholar] [CrossRef]
Haider, W.; Rehman, A.-U.; Durrani, N.M.; Rehman, S.U. A Generic Approach for Wheat Disease Classification and Verification Using Expert Opinion for Knowledge-Based Decisions. IEEE Access 2021, 9, 31104–31129. [Google Scholar] [CrossRef]
Al-Badri, A.H.; Ismail, N.A.; Al-Dabbagh, B.A. Classification of Tomato Plant Diseases Using Deep Learning Technique. In Proceedings of the 2023 2nd International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Dubai, United Arab Emirates, 30–31 December 2023; pp. 1–6. [Google Scholar]
Sundararaj, A.; Mathew, P.; Ramakrishnan, M. Deep Learning Based Pepper Leaf Disease Classification Using CNN and Transformer. In Proceedings of the 2023 International Conference on Computing, Communication, and Security (ICCCS), Punjab, India, 3–4 March 2023; pp. 1–6. [Google Scholar]
Sharma, S.; Priya, P. Classification of Tomato Leaf Disease Using Deep Learning with Multimodal Feature Extraction. In Proceedings of the IEEE International Conference on Recent Advances in Electronics, Communication & Technology (ICRAECT), Matsue, Japan, 5–8 November 2023; pp. 1–6. [Google Scholar]
Sharma, P.; Geetha, S.; Srinivasulu, G. Plant Leaf Disease Detection using K-Means Clustering and Artificial Neural Network (ANN). Eur. J. Mol. Clin. Med. 2023, 10, 393–401. [Google Scholar]
Abou-Nasr, M. A Deep Learning-Based Tomato Leaf Disease Classification Method Using Transfer Learning. In Proceedings of the 2023 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 13–15 December 2023; pp. 1–6. [Google Scholar]
Chen, H.; He, L.; Dong, S.; Liu, J.; Lin, Y.; Peng, J. Multi-Scale Network and Transfer Learning for Classification of Tomato Leaf Diseases. IEEE Access 2024, 12, 1075–1087. [Google Scholar]
Wu, Q.; Chen, Y.; Meng, J. DCGAN-Based Data Augmentation for Tomato Leaf Disease Identification. IEEE J. Mag. 2020, 8, 98716–98728. [Google Scholar] [CrossRef]
Xu, M.; Yoon, S.; Fuentes, A.; Yang, J.; Park, D.S. Style-Consistent Image Translation: A Novel Data Augmentation Paradigm to Improve Plant Disease Recognition. Front. Plant Sci. 2022, 12, 773142. [Google Scholar] [CrossRef] [PubMed]
Cap, Q.H.; Uga, H.; Kagiwada, S.; Iyatomi, H. LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease Diagnosis. IEEE Trans. Automat. Sci. Eng. 2022, 19, 1258–1267. [Google Scholar] [CrossRef]
Min, B.; Kim, T.; Shin, D.; Shin, D. Data Augmentation Method for Plant Leaf Disease Recognition. Appl. Sci. 2023, 13, 1465. [Google Scholar] [CrossRef]
Ariyapadath, S. Plant leaf classification and comparative analysis of combined feature set using machine learning techniques. IIETA Trait. Du Signal 2021, 38, 1587–1598. [Google Scholar] [CrossRef]
Sridevi, S.; Famila, S.; Mariammal, G.; Hemalatha, K.; Havish, M.S. Detection and Categorization of Tomato Plant Diseases Using Aa Convolutional Neural Network. In Proceedings of the 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), Bengaluru, India, 15–16 March 2024. [Google Scholar]
Elumalai, S.; Hussain, F.B.J. Utilizing Deep Convolutional Neural Networks for Multi-Classification of Plant Diseases from Image Data. Trait. Du Signal 2023, 40, 1479–1490. [Google Scholar] [CrossRef]
Pan, Y. Research on Leaf Classification under Different Classification Methods. In Proceedings of the 2021 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 29–31 July 2021; pp. 697–700. [Google Scholar]
Le, V.N.T.; Apopei, B.; Alameh, K. Effective plant discrimination based on the combination of local binary pattern operators and multiclass support vector machine methods. Inf. Process. Agric. 2019, 6, 116–131. [Google Scholar]
Sachar, S.; Kumar, A. Survey of feature extraction and classification techniques to identify plant through leaves. Expert Syst. Appl. 2021, 167, 114181. [Google Scholar]
Azlah, M.A.F.; Chua, L.S.; Rahmad, F.R.; Abdullah, F.I.; Wan Alwi, S.R. Review on Techniques for Plant Leaf Classification and Recognition. Computers 2019, 8, 77. [Google Scholar] [CrossRef]
Beghin, T.; Cope, J.S.; Remagnino, P.; Barman, S. Shape and Texture Based Plant Leaf Classification. In Advanced Concepts for Intelligent Vision Systems; Blanc-Talon, J., Bone, D., Philips, W., Popescu, D., Scheunders, P., Eds.; ACIVS 2010. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6475. [Google Scholar]
Nijalingappa, P.; Madhumathi, V.J. Plant identification system using its leaf features. In Proceedings of the International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Davangere, India, 29–31 October 2015; pp. 338–343. [Google Scholar]
Ahmed, S.U.; Shuja, J.; Tahir, M.A. Leaf Classification on Flavia Dataset: A Detailed Review. Sustain.Comput. Inform. Syst. 2023, 40, 100907. [Google Scholar] [CrossRef]
Barburiceanu, S.; Meza, S.; Orza, B.; Malutan, R.; Terebes, R. Convolutional Neural Networks for Texture Feature Extraction. Applications to Leaf Disease Classification in Precision Agriculture. IEEE Access 2021, 9, 160085–160103. [Google Scholar] [CrossRef]
Kumar, V.; Kumari, V.; Kumar, C. Machine Learning for Leaf Image Classification Based on a Novel Spice Plants Leaf Image Dataset. Grenze Int. J. Eng. Technol. GIJET 2023. [Google Scholar]
Ayumi, V.; Ermatita, E.; Abdiansah, A.; Noprisson, H.; Purba, M.; Utami, M. A Study on Medicinal Plant Leaf Recognition Using Artificial Intelligence. In Proceedings of the International Conference on Informatics, Multimedia, Cyber and Information System, Jakarta, Indonesia, 28–29 October 2021; pp. 40–45. [Google Scholar]
Ihsan, M.F.; Sunyoto, A.; Arief, M.R. Gray Level Co-Occurrence Matrix Algorithm and Backpropagation Neural Networks for Herbal Plants Identification. In Proceedings of the 2022 5th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 24–25 August 2022; pp. 373–378. [Google Scholar]
Huang, H.; Cheng, S.; Xu, L. Overall Loss for Deep Neural Networks. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Nature: Berlin/Heidelberg, Germany, 2019; Volume 11607, pp. 223–231. [Google Scholar] [CrossRef]
Jyothi, R.L.; Abdul Rahiman, M. A Multilevel CNN Architecture for Character Recognition from Palm Leaf Images. Adv. Intell. Syst. Comput. 2020, 1034, 185–193. [Google Scholar] [CrossRef]
Wu, Y.-X.; Guo, L.; Li, Y.; Shen, X.-Q.; Yan, W.-L. Multi-Layer Support Vector Machine and Its Application. In Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; Volume 2006, pp. 3627–3631. [Google Scholar]
Hassan, S.M.; Maji, A.K. Comparison of Automated Leaf Recognition Techniques. Int. J. Intell. Enterp. 2021, 8, 205–214. [Google Scholar] [CrossRef]
Thyagharajan, K.K.; Kiruba Raji, I. A Review of Visual Descriptors and Classification Techniques Used in Leaf Species Identification. Arch. Comput. Methods Eng. 2019, 26, 933–960. [Google Scholar] [CrossRef]
Devi, R.M.; Sangeetha, M.; Sagana, C.; Savitha, S.; Hemalatha, P.; Janani, N.; Maamathi, K. Plant Type Classification Based on Leaves Using Fusion Based Support Vector Machine. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 23–25 January 2023. [Google Scholar]
Kaur, P.P.; Singh, S. Analysis of Multiple Classifiers for Herbal Plant Recognition. In Proceedings of the 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021; Volume 2021, pp. 78–83. [Google Scholar]
Pushpa, B.R.; Athira, P.R. Plant Species Recognition Based on Texture and Geometric Features of Leaf. In Proceedings of the 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 315–320. [Google Scholar]
Mahurkar, D.P.; Patidar, H. Revealing Leaf Species through Specific Contour and Region-Based Features Extraction. E-Prime-Adv. Electr. Eng. Electron. Energy 2023, 5, 100228. [Google Scholar] [CrossRef]
Darshana, S.; Soumyakanta, K.A. Revolutionary Machine-Learning Based Approach for Identifying Ayurvedic Medicinal Plants. In Proceedings of the International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), Bhubaneswar, India, 19–20 November 2022. [Google Scholar]
Pushpanathan, K.; Hanafi, M.; Mashohor, S.; Fazlil Ilahi, W.F. Machine Learning in Medicinal Plants Recognition: A Review. Artif. Intell. Rev. 2021, 54, 305–327. [Google Scholar] [CrossRef]
Elbasi, E.; Mostafa, N.; Zaki, C.; AlArnaout, Z.; Topcu, A.E.; Saker, L. Optimizing Agricultural Data Analysis Techniques through AI-Powered Decision-Making Processes. Appl. Sci. 2024, 14, 8018. [Google Scholar] [CrossRef]
Al-Eiadeh, M.R.; Qaddoura, R.; Abdallah, M. Investigating the Performance of a Novel Modified Binary Black Hole Optimization Algorithm for Enhancing Feature Selection. Appl. Sci. 2024, 14, 5207. [Google Scholar] [CrossRef]
Caria, M.; Todde, G.; Sara, G.; Piras, M.; Pazzona, A. Performance and Usability of Smartglasses for Augmented Reality in Precision Livestock Farming Operations. Appl. Sci. 2020, 10, 2318. [Google Scholar] [CrossRef]
Bahaghighat, M.; Motamedi, S.A.; Xin, Q. Image Transmission over Cognitive Radio Networks for Smart Grid Applications. Appl. Sci. 2019, 9, 5498. [Google Scholar] [CrossRef]
Elbasi, E.; Zaki, C.; Topcu, A.E.; Abdelbaki, W.; Zreikat, A.I.; Cina, E.; Shdefat, A.; Saker, L. Crop Prediction Model Using Machine Learning Algorithms. Appl. Sci. 2023, 13, 9288. [Google Scholar] [CrossRef]
Topcu, A.E.; Zreikat, A.; Elbasi, E. Machine Learning Approaches for the Diagnosis of H1N1 and COVID-19. Int. J. Intell. Syst. Appl. Eng. 2023, 12, 436–447. [Google Scholar]
Elbasi, E.; Mostafa, N.; AlArnaout, Z.; Zreikat, A.I.; Cina, E.; Varghese, G.; Shdefat, A.; Topcu, A.E.; Abdelbaki, W.; Mathew, S.; et al. Artificial Intelligence Technology in the Agricultural Sector: A Systematic Literature Review. IEEE Access 2023, 11, 171–202. [Google Scholar] [CrossRef]

Figure 1. Structure of plant leaf classification.

Figure 2. Features selection, extraction, and classification using machine learning.

Figure 3. Overview of Leaf Type Identification Process.

Figure 4. Accuracy with margin features, texture features, and after-feature selection.

Figure 5. Accuracy of plant leaf classification.

Figure 6. MAE, RAE, RMSE, and TP rates.

Figure 7. Samples of correctly classified leaves.

Figure 8. Sample of incorrectly classified leaves.

Table 1. Comparison of algorithms.

Literature Work				Proposed Work
Reference	# of Classes	Method	Accuracy	# of Classes	Method	Accuracy
[47]	2	CNN	95%	2	Multiple algorithms	100%
[10]	2	SVM	91%	2	Multiple algorithms	100%
[13,48]	4	PiTLiD	99.45%	5	NBC	100%
[49]	10	DNN-PDC	94.6%	10	NBC	100%
[5,12]	10	CNN	96%	10	NBC	100%
[16]	16	RF	97.98%	25	NBC	96.21
[7,50]	31	NFC	97.6%	25	NBC	96.21
[16]	38	YR2S	99.69%	25	NBC	96.21
[9,14]	43	ANN	94.88%	25	NBC	96.21

Table 2. Accuracy and error values for plant classification with margin feature set.

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	79	0.0047	0.0575	23.361	0.79	0.02	80.37	79.79	80.18
Naïve Bayes Classifier	85.18	0.0031	0.0529	15.311	0.852	0.002	86.73	86.03	86.88
Multilayer Perception	83.13	0.0049	0.0509	24.682	0.831	0.002	85.62	83.96	84.79
Hoeffding Tree	83.87	0.0034	0.0529	16.787	0.839	0.002	86.38	84.62	85.54
J48	50.5	0.0104	0.0951	51.952	0.505	0.005	52.01	51.37	51.51
Random Forest	82.5	0.0132	0.0716	65.873	0.825	0.002	84.97	83.32	84.15
CNN	81.72	0.0057	0.0537	22.601	0.818	0.002	84.17	82.53	83.35

Table 3. Accuracy and error values for plant classification with texture feature set.

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	63.28	0.0078	0.0778	38.787	0.633	0.004	65.17	63.91	64.54
Naïve Bayes Classifier	66.97	0.0067	0.0802	33.599	0.670	0.003	68.97	67.63	68.49
Multilayer Perception	73.13	0.0034	0.0502	44.614	0.731	0.003	75.32	73.86	74.59
Hoeffding Tree	70.73	0.006	0.0747	29.772	0.707	0.003	72.85	71.43	71.89
J48	70.25	0.0144	0.0777	71.786	0.703	0.003	71.49	70.36	72.04
Random Forest	72.5	0.014	0.0757	69.786	0.725	0.003	71.13	72.59	72.47
CNN	72.82	0.0042	0.0526	41.148	0.728	0.003	73.92	73.11	72.91

Table 4. Accuracy and error values for plant classification (2 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	100	0	0	0	1	0	100	100	100
Naïve Bayes Classifier	100	0	0	0	1	0	100	100	100
Multilayer Perception	100	0.0076	0.0086	1.514	1	0	100	100	100
Hoeffding Tree	100	0.0076	0.0086	1.514	1	0	100	100	100
J48	93.75	0.0625	0.25	12.328	0.938	0.049	94.12	94.01	93.82
Random Forest	100	0.0609	0.0833	12.09	1	0	100	100	100
CNN	100	0.0063	0.0081	1.428	1	0	100	100	100

Table 5. Accuracy and error values for plant classification (5 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	96.25	0.0145	0.0952	4.5204	0.963	0.009	97.43	97.27	98.01
Naïve Bayes Classifier	100	0.0001	0.0015	0.0382	1	0	100	100	100
Multilayer Perception	98.75	0.0177	0.0733	5.5091	0.998	0.003	98.67	98.42	99.03
Hoeffding Tree	96.15	0.0135	0.1087	4.1312	0.962	0.009	97.62	97.34	98.04
J48	77.5	0.0911	0.2929	28.4124	0.775	0.056	78.92	78.19	77.61
Random Forest	97.5	0.0847	0.1476	26.4045	0.975	0.006	98.91	98.96	97.16
CNN	99.26	0.0162	0.0649	5.2344	0.992	0.002	99.74	99.61	99.17

Table 6. Accuracy and error values for plant classification (10 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	98.11	0.0039	0.0578	2.1841	0.981	0.001	99.17	98.71	99.64
Naïve Bayes Classifier	100	0.0001	0.0012	0.0409	1	0	100	100	100
Multilayer Perception	96.22	0.0165	0.077	9.109	0.962	0.004	96.17	97.84	96.27
Hoeffding Tree	98.11	0.0038	0.0608	2.1115	0.981	0.001	99.72	99.35	99.47
J48	81.13	0.0432	0.1869	23.9149	0.811	0.025	82.67	82.46	81.37
Random Forest	96.22	0.0557	0.1183	30.7785	0.962	0.002	97.34	97.37	98.04
CNN	97.83	0.0041	0.0517	2.379	0.978	0.001	98.82	98.26	98.91

Table 7. Accuracy and error values for plant classification (25 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	92.42	0.0059	0.067	7.6407	0.924	0.003	93.84	92.57	93.91
Naïve Bayes Classifier	96.21	0.0031	0.055	3.9692	0.962	0.001	96.17	96.71	96.53
Multilayer Perception	94.70	0.0091	0.056	11.8256	0.947	0.001	95.83	95.04	94.68
Hoeffding Tree	93.93	0.0045	0.064	5.8138	0.939	0.001	95.61	94.13	95.82
J48	72.72	0.0233	0.144	30.2762	0.727	0.01	72.59	72.83	72.64
Random Forest	96.21	0.0358	0.105	46.5241	0.962	0.001	96.82	96.73	95.08
CNN	96.35	0.0391	0.121	41.3724	0.963	0.001	96.82	97.03	97.17

Table 8. Accuracy and error values for plant classification (50 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	87.13	0.0057	0.0644	14.46	0.871	0.003	88.53	87.86	87.37
Naïve Bayes Classifier	89.38	0.0042	0.0614	10.67	0.894	0.002	90.82	90.27	89.67
Multilayer Perception	87.70	0.0051	0.0601	11.03	0.871	0.002	88.91	88.12	88.53
Hoeffding Tree	88.88	0.0045	0.0606	11.60	0.889	0.002	89.91	89.12	89.82
J48	64.50	0.0151	0.1150	38.58	0.645	0.007	66.19	65.92	67.01
Random Forest	88.38	0.0206	0.0839	52.41	0.884	0.002	89.95	89.81	90.04
CNN	88.14	0.0062	0.0561	10.59	0.881	0.002	90.09	89.27	89.83

Table 9. Accuracy and error values for plant classification (75 class).

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	80.59	0.0056	0.0633	21.2627	0.806	0.003	82.41	82.14	81.96
Naïve Bayes Classifier	85.93	0.0038	0.0582	14.3344	0.859	0.002	85.79	86.17	86.61
Multilayer Perception	83.12	0.0051	0.0601	11.0300	0.832	0.003	84.92	84.27	85.06
Hoeffding Tree	85.85	0.0040	0.0572	15.1881	0.859	0.002	87.28	88.14	86.47
J48	52.79	0.0129	0.1057	49.0242	0.528	0.007	54.61	55.07	54.62
Random Forest	85.39	0.0161	0.0776	61.2607	0.854	0.002	85.67	85.94	84.37
CNN	83.64	0.0047	0.0571	10.3729	0.836	0.003	84.93	85.07	84.83

Table 10. Accuracy and error values after feature selection and structural data.

Algorithm	Accuracy	MAE	RMSE	RAE	TP	FP	Precision	F1	Recall
Bayes Net	81.75	0.0041	0.0513	21.2208	0.815	0.02	83.47	82.39	83.51
Naïve Bayes Classifier	89.63	0.0023	0.042	14.492	0.896	0.01	89.17	88.24	88.67
Multilayer Perception	89.48	0.004	0.0621	10.12	0.897	0.001	87.91	88.34	88.09
Hoeffding Tree	89.92	0.0030	0.0424	15.7	0.899	0.01	91.27	91.35	90.82
J48	57.87	0.0152	0.1251	51.75	0.581	0.008	60.18	59.67	60.36
Random Forest	86.81	0.0037	0.0584	14.33	0.859	0.002	87.26	86.54	86.19
CNN	88.72	0.0034	0.0526	13.42	0.887	0.02	89.92	89.97	88.72

Table 11. Comparison table with the proposed algorithm with state-of-the-art models.

Algorithm	Accuracy	MAE	TP	FP	Precision	F1	Recall
Proposed Model	89.92	0.0030	0.899	0.01	91.27	91.35	90.82
CNN	88.72	0.0034	0.887	0.002	89.92	89.97	88.72
ResNet-50	87.24	0.0042	0.869	0.03	88.17	88.39	89.02
DenseNet-121	87.38	0.0051	0.872	0.02	87.62	88.36	86.57
Random Forest	86.81	0.0037	0.887	0.02	87.26	86.54	86.19
Support Vector Machine	88.27	0.0031	0.885	0.002	87.64	88.02	86.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elbasi, E.; Topcu, A.E.; Cina, E.; Zreikat, A.I.; Shdefat, A.; Zaki, C.; Abdelbaki, W. Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning. Appl. Sci. 2024, 14, 10507. https://doi.org/10.3390/app142210507

AMA Style

Elbasi E, Topcu AE, Cina E, Zreikat AI, Shdefat A, Zaki C, Abdelbaki W. Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning. Applied Sciences. 2024; 14(22):10507. https://doi.org/10.3390/app142210507

Chicago/Turabian Style

Elbasi, Ersin, Ahmet E. Topcu, Elda Cina, Aymen I. Zreikat, Ahmed Shdefat, Chamseddine Zaki, and Wiem Abdelbaki. 2024. "Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning" Applied Sciences 14, no. 22: 10507. https://doi.org/10.3390/app142210507

APA Style

Elbasi, E., Topcu, A. E., Cina, E., Zreikat, A. I., Shdefat, A., Zaki, C., & Abdelbaki, W. (2024). Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning. Applied Sciences, 14(22), 10507. https://doi.org/10.3390/app142210507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Plant Leaf Classification over a Large Number of Classes Using Machine Learning

Abstract

1. Introduction

2. Literature Review

2.1. Feature Extraction

2.2. Plant Leaf Classification

2.3. Leaf Disease Identification

3. Feature Extraction and Selection

4. Machine Learning for Plant Leaf Classification

5. Proposed Methodology

6. Experimental Results and Discussions

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI