1. Introduction
Mycotoxins are a group of naturally occurring toxic chemical compounds produced by certain species of moulds (fungi) during growth on various crops and foodstuffs, including cereals, nuts, spices, and dairy products [
1]. The ingestion of certain mycotoxins has been linked to a range of harmful health impacts on both humans and animals, from short-term poisoning to long-term consequences such as liver cancer and, in some cases, death [
2,
3,
4]. Mycotoxins are secondary metabolites (that is, compounds produced by an organism that are not essential for its primary life processes) and are often produced during the pre-harvest, harvest, and storage phases under favourable conditions of humidity and temperature [
3,
5]. The most prevalent mycotoxins include aflatoxins, tricothecenes, fumonisins, zearalenones, ochratoxins, and patulin, and are produced by certain plant-pathogenic species of
Aspergillus,
Fusarium, and
Penicillium [
6]. Mycotoxin contamination in crop products has been found to vary significantly across different geographical locations and is influenced by annual weather conditions [
7,
8]. However, since 2012, there has been a noted increase in the occurrence of mycotoxins in Europe, with the impacts of climate change being most likely a contributing factor [
9,
10]. An estimated 60–80% of the world’s crop supply is contaminated by mycotoxins, and an estimated 20% of those crops surpass the legally mandated food safety thresholds set by the European Union (EU) [
11].
With the world’s food supply chain being highly interconnected, the presence of mycotoxins not only endangers human health but also has an impact on the stability of agricultural markets and trade [
3,
12]. The economic impact of mycotoxin contamination is substantial, with a global estimate in the billions of euros for detection, regulation enforcement, and mitigation efforts to manage mycotoxin presence in food and feeds annually [
13]. It is estimated that, between 2010 and 2019, approximately 75 million tonnes of wheat in Europe, which constitutes 5% of the wheat intended for human consumption, surpassed the maximum threshold for DON contamination. This excess led to the reclassification of this contaminated wheat grain as ‘animal feed’, resulting in an economic loss of around EUR 3 billion [
14]. Additionally, [
15] shows that, between 2010 and 2020, aflatoxins were responsible for the demotion of 4.2% of wheat intended for food, which potentially represented an additional economic loss of EUR 2.5 billion. As a result, the detection and management of mycotoxins in crops and food products is crucial for ensuring food safety and safeguarding consumer health worldwide as well as contributing to economic stability.
According to [
16], the standard methodology for mycotoxin detection comprises three main steps: sampling, sample preparation, and analytical determination. Chromatographic techniques, such as liquid chromatography mass spectrometry (LC–MS), high-performance liquid chromatography (HPLC), and gas chromatography mass spectrometry (GC–MS), along with immunoassay-based methods like enzyme-linked immunosorbent assays (ELISAs), are widely recognised as the most prevalent analytical approaches for the detection of mycotoxins [
17,
18]. The mycotoxin level in a bulk load is determined by measuring a sample taken from the food source. From this, the concentration of mycotoxins in the entire load is assumed to be the same as the concentration of the sample. However, these techniques often require extensive sample preparation, sophisticated equipment, and highly trained personnel, leading to significant costs and time delays in the analytical process. Furthermore, the varied and intricate nature of different foods requires customised detection methods, which can add complexity to the screening process [
19,
20].
While traditional detection methods such as LC–MS, HPLC, GC–MS, and ELISA generate reliable data, they often result in large, complex datasets that require extensive interpretation and analysis. Machine learning (ML) approaches for both the detection and prediction of the presence of mycotoxins have seen a rise in recent years as an alternative to traditional detection methods (see
Figure 1). At its core, ML employs statistical methods to create algorithms that allow computers to learn from data and make decisions based on identified patterns and inferences, without being explicitly programmed for each specific task. ML methods offer a sophisticated approach to deciphering the complex patterns hidden within the data and are adept at processing and analysing large datasets and extracting meaningful patterns that are not immediately apparent. By leveraging ML algorithms, researchers can gain deeper insights into the data and offer a significant advantage, when compared with traditional lab analysis, in terms of efficiency, cost, and scalability, as well as maintaining or improving the accuracy of mycotoxin detection [
21].
ML methods can be, broadly, broken into three categories, that is, supervised learning (SL), unsupervised learning (UL), and reinforcement learning (RL). In SL, an algorithm is trained using a dataset that includes both inputs and the corresponding outputs. The model learns to associate the inputs with the outputs. After training, the model can apply this learned relationship to predict the outputs for new, unseen inputs [
22]. In UL, an algorithm is presented with only the input data and identifies patterns and structures in the data based only on the inputs. After training, it can classify new inputs based on the patterns it has found. In RL, an algorithm learns to make decisions by performing actions to achieve a goal. It processes feedback through rewards or penalties associated with its actions, using this information to develop a decision-making framework that aims to maximise rewards [
23].
Within these categories, many different types of ML models exist and are used based on the specificity of the problem. The most popular of these models, as found by this research, are discussed in detail below. Although ML applications in food safety and mycotoxin detection are widespread, there appears to be a lack of comprehensive reviews that cover the broad spectrum of ML methodologies specifically tailored to mycotoxin analysis, as most studies tend to concentrate on individual techniques. For example, Ref. [
24] uses neural networks (NNs) for the prediction of contamination from the mycotoxin fumonisin in corn. Additionally, NNs have been used to forecast the accumulation of the trichothecene mycotoxin deoxynivalenol (DON) in barley seeds [
25] and to predict fungal growth [
26]. For a comprehensive review of the use of NNs in food science, see Ref. [
27]; for a review of ML methods in general in the field of food safety, see Ref. [
28]; and in agriculture, see Ref. [
21].
ML techniques can alleviate some of the current burdens of mycotoxin detection by providing an efficient and low-cost solution [
29]. Additionally, with the impact of climate change, the need for these models to provide reliable predictions at the farm level is increasingly crucial, especially in terms of food safety and health. In this work, we present a comprehensive systematic review of some of the more popular ML techniques used in the detection and prediction of mycotoxin on a range of foods and crops. Our review also identifies critical areas in the current body of work that warrant attention. A notable concern is the often insufficient discussion on the selection and tuning of hyperparameters in ML models, which is crucial for understanding and replicating study results. This lack of details creates issues with the reproducibility of the reviewed methods and also hinders the advancement and application of these techniques.
The organisation of our article is as follows: In
Section 2, we provide details regarding our literature search methodology. This includes a description of the search criteria and keywords and discussing the prevalence of each ML method. In
Section 3, we provide a short introduction to the ML process and describe some of the common terms. In
Section 4, we give a brief introduction to the main ML algorithms used (and their hyperparameters) and discuss the outcomes of the articles reviewed based on the type of machine learning model used. Finally, in
Section 5, we provide some concluding remarks.
4. Application of Machine Learning to Mycotoxin Data
In this section, we first include a brief discussion on common data types in mycotoxin detection. We then discuss the most common ML algorithms (from
Figure 2) and review their application to mycotoxin data. Each subsection is dedicated to a single ML method in which we describe the basic algorithm, how it makes predictions/detections, some advantages and disadvantages of the algorithm, and finally a review of the literature using these methods. In cases where the reviewed studies employ multiple machine learning models, we categorise each paper based on the highest-performing model used in that particular work.
4.1. Types of Data Used in Mycotoxin Detection
In the context of ML applications for mycotoxin detection, the literature highlights the use of various data types, including weather parameters (temperature, rainfall, and relative humidity), crop phenology, agronomic data, and spectral imaging. Additionally, spatiotemporal data, which include information collected over time and across different spatial locations, play a vital role in understanding and predicting mycotoxin contamination by incorporating key environmental variables and temporal dynamics. Each type of data offers unique characteristics and applications. Understanding the context and conditions under which these data are collected is essential for interpreting the results and evaluating the effectiveness of different ML models.
4.1.1. Weather Data
Weather variables, including temperature, relative humidity, precipitation, and carbon dioxide levels, play a significant role in mycotoxigenic fungal growth and subsequent mycotoxin formation on agricultural commodities [
30,
31,
32]. ML models can leverage historical and real-time weather data to predict the likelihood of mycotoxin contamination. For example, continuous monitoring of these variables in the field can help create more dynamic and responsive models. Incorporating these factors allows for a more comprehensive understanding of the conditions that favour mycotoxin contamination and can improve the predictive power of ML models.
Ref. [
33] proposed a Convolutional Neural Network model based on CO2 respiration rate and the visual appearance of mold formation for classifying mycotoxin contamination in wheat grains stored in sealed containers, which achieved an accuracy of 83.3%. Ref. [
34] constructed a predictive model that incorporated multiple data sources, such as historical records of aflatoxin and fumonisin in corn, daily weather conditions, satellite imagery, dynamic geospatial soil characteristics, and land usage information. Using both a gradient boosting machine and a neural network, the study demonstrated that the NN models exhibited high class-specific accuracy for predicting mycotoxin levels over a 1-year period, with accuracies of 73% for aflatoxin and 85% for fumonisin, demonstrating their efficacy in forecasting annual mycotoxin levels.
4.1.2. Agronomic Data
The impact of agronomic factors on mycotoxin occurrence has been extensively studied in various research. These factors include previous crop details, the use of fungicides, cropping patterns, and cultivar selection, all of which have been found to significantly affect mycotoxin levels [
35,
36,
37]. In a study by Ref. [
38], data on cropping system factors were used as input variables to predict aflatoxins and fumonisins in corn. Additionally, soil properties, when combined with meteorological data and historical aflatoxin content, have been used in gradient boosting machine models to distinguish aflatoxin-contaminated corn [
39].
4.1.3. Crop Phenology and Cultivar-Specific Data
Another important aspect of spatiotemporal data is the inclusion of specific cultivars. Different crop varieties can exhibit varying levels of susceptibility to fungal colonisation and mycotoxin contamination [
37,
40]. Including data on specific cultivars in ML models can help tailor predictions and interventions to the particular characteristics of each crop variety. Certain wheat varieties may be more resistant to Fusarium head blight, while others might be more prone to infection. By incorporating cultivar-specific data, ML models can provide more accurate risk assessments and suggest more effective mitigation strategies [
41]. This approach enhances the precision of mycotoxin contamination forecasts and supports targeted agricultural practices, such as selecting the most resistant varieties for planting in high-risk areas. Additionally, integrating crop phenology data, such as growth stages and development timelines, can improve the temporal accuracy of predictions [
42].
4.1.4. Spectral Data
Spectral data are one of the most common types used in mycotoxin detection, valued for their non-invasive nature. This data type involves capturing the reflectance or absorbance of light at various wavelengths from the material being analysed. Spectral data can be further categorised into multispectral and hyperspectral data, each offering different levels of detail and information.
Multispectral Imaging: This imaging technique captures data at a few specific wavelength bands, making it effective for distinguishing between different materials based on their spectral signatures. Unlike hyperspectral imaging, which captures continuous spectral information across a wide range of wavelengths, multispectral imaging focuses on discrete bands, making data collection and processing less complex while still providing valuable information for specific applications. For instance, multispectral images can be captured in controlled greenhouse environments, where conditions such as temperature, humidity, and lighting are regulated to optimise data quality. This controlled setting allows for consistent and repeatable measurements, crucial for precise analysis. An example of this application is a study [
43] that used hyperspectral data to detect Fusarium head blight in wheat under greenhouse conditions, demonstrating the potential of spectral imaging in plant pathology. Moreover, multispectral imaging can be integrated with advanced computational techniques for enhanced analysis. In another study, Ref. [
44] used ML combined with multispectral imaging and image processing techniques to detect aflatoxin contamination in figs.
Hyperspectral Imaging: Hyperspectral imaging is a technique that captures data across a continuous spectrum of wavelengths, providing significantly more detailed information compared with multispectral imaging. This method is particularly valuable for the precise identification of toxigenic fungal contaminants and mycotoxins [
45]. Hyperspectral images can be acquired using various platforms, including ground-based systems and unmanned aerial vehicles (UAVs). UAV hyperspectral imagery showed to effectively monitor Fusarium head blight in wheat fields, highlighting its potential for large-scale agricultural monitoring [
46]. In another study, Ref. [
47] used a visible and near-infrared hyperspectral imaging system operating in the range of 400–900 nm under ultraviolet excitation. They successfully differentiated spectral characteristics between corn kernels inoculated with aflatoxigenic A. flavus strains and naturally infected kernels from the same field. Furthermore, Ref. [
48] explored the combination of fluorescence and reflectance visible and near-infrared hyperspectral images for detecting aflatoxin contamination in inoculated corn kernels in the field.
Ground vs. Intact Material: The context in which spectral data are collected can also vary. In some cases, imaging occurs on ground material, where samples are collected and analysed in a laboratory setting. This approach allows for controlled conditions and high-resolution data. In other instances, imaging is performed on intact material, such as whole peanut grains [
49], to assess contamination directly in the field or during processing.
4.1.5. Limitations in Image Analysis
While image analysis using spectral data is a powerful tool for detecting mycotoxins, there are notable limitations and challenges. One significant factor is that visual features of an image, such as plant damage or fungal presence, may not always directly correlate with the presence of specific mycotoxins [
50]. This is particularly relevant when different species of fungi, capable of producing various mycotoxins, are involved [
51]. For example, certain fungi can cause visible damage or contamination on crops, which may be detected by ML models. However, these visual features might not indicate the presence of the specific mycotoxin of interest [
50]. As a result, models focusing on plant damage or fungal contamination might not accurately reflect the levels of regulated mycotoxins. This discrepancy underscores the importance of integrating spectral imaging features that are more closely associated with the specific mycotoxins being regulated. Addressing this challenge requires combining image analysis with other data types, such as chemical analysis or molecular techniques, to improve the specificity and accuracy of mycotoxin detection. By doing so, ML models can better distinguish between general fungal contamination and the presence of specific harmful mycotoxins.
4.2. Neural Networks
Neural networks (NNs), first introduced by [
52], are a class of machine learning algorithms modelled loosely after the human brain [
53]. They are designed to identify patterns and make predictions by learning from data and can be used for supervised or unsupervised problems. NNs are made up of interconnected nodes and edges, where the nodes represent the
neurons and the edges are the links between the neurons. The nodes are organised into layers, where the first layer is called the input layer, the last layer is the output layer, and all intermediate layers are called hidden layers. Typically, in an NN, the data are fed to the input layer; then one or more hidden layers perform computations and learn from the data, and finally, predictions (or classifications) are provided by the output layer. A simple diagram of an NN can be seen in
Figure 4.
Every neuron in a hidden layer applies a weighted sum of the inputs to transform the data. This is followed by a function, referred to as an
activation function [
53]. The network fine-tunes the weights associated with each neuron by employing optimisation algorithms throughout the training phase. There are numerous hyperparameters associated with NNs. Some of the main hyperparameters include (i) the learning rate, which determines how much the weights are changed at each iteration; (ii) the number of epochs, which refers to how many times the entire training dataset is passed forward and backward through the neural network; (iii) the batch size, which controls the number of training examples used in one iteration; and (iv) activation functions like ReLU (Rectified Linear Unit), sigmoid, and tanh, which determine the output value of a node given an input or a set of inputs. After training, the NN is capable of generating predictions for new, unseen data by passing the input across the layers to produce an output.
Like all machine learning models, NNs come with their own set of advantages and disadvantages. For example, NNs excel at identifying and modelling non-linear interactions present in data, which are common in biological processes. They are also flexible and can handle a wide range of data types, such as numerical and categorical, text, and image data. Despite their advantages, neural networks also have limitations. One of the major limitations is interpretability. NNs are considered
black-box algorithms, meaning that it is difficult to understand why specific predictions are being made [
54]. Second, like many of the other ML approaches we cover, they are not probabilistic models, making it hard to accurately quantify the uncertainty in the predictions. Overfitting can also be an issue for NNs. Without appropriate regularisation, NNs can become too complex, capturing the noise in the training data instead of generalising to the underlying pattern [
55]. Finally, training large NNs requires a significant amount of computing power. The computational cost of NNs will increase with the complexity of the model [
56]. In the following subsections, we review the use of NNs on different types of mycotoxin data.
4.2.1. NNs Applied to Spatiotemporal Data
NNs have been widely applied to spatiotemporal data, despite them not forming part of the traditional suite of spatiotemporal analytics techniques. In the field of mycotoxin study, NNs have been used for a variety of tasks and data types. For example, Ref. [
38] used data from several sites in Northern Italy over the years 2005 to 2018. Their goal was to predict the presence of mycotoxins (specifically, aflatoxin and fumonisins) using NNs in corn. In their work, they trained two NNs to predict if the contamination levels were above legal thresholds at the time of harvest. Both models performed well, achieving an accuracy of greater than 75% on the test data. However, they recommend, for future research, that improvements can be made to the modelling by taking into account the co-occurrence of aflatoxin and fumonisins in corn and their complex interaction, which may be due to the effects of climate change.
Ref. [
57] applied NNs to analyse the concentration of mycotoxins in winter wheat grain. They examined 23 winter wheat genotypes with different Fusarium resistances from three different sites in Poland during the years 2011 to 2013. They developed three NN models; however, only two of these are concerned with the detection of mycotoxins, that is, the DONANN model, which is used to detect DON, and the NIVANN model, which examines the nivalenol content. The DONANN and NIVANN models were designed using an automatic network designer using Statistica v7.1 software [
58], and were evaluated among a set of 10,000 generated networks. The performance of these models was assessed on several statistical metrics, but the primary focus was on the correlation coefficient (which, in this case, would be the correlation between the predicted values from the model and the actual observed values) and the mean absolute error (MAE), which is the absolute differences between the predicted values and the actual values. For the best-performing DONANN model, a low MAE of 0.37 was reported; however, the correlation coefficient was exceptionally high at 0.99, indicating an almost perfect linear relationship between the predicted and actual values. The best-performing NIVANN model, while exhibiting a slightly lower correlation coefficient of 0.81 and an MAE of 0.02, still performed within acceptable ranges. The architecture of the created models was designed as a multi-layer perceptron (MLP) type of NN, with two hidden layers. Despite reporting training, validation, and test errors, the authors did not specify the dataset on which the correlation and MAE metrics were based.
In a novel application of NNs, Ref. [
59] used a transformer-based deep learning method, called
GPTransformer. A transformer-based deep learning algorithm refers to a type of NN architecture that relies on a mechanism called
attention to boost the performance of the model [
60]. In their work, the authors proposed a transformer-based genomic prediction model for predicting Fusarium head blight disease levels and associated DON concentration in barley data collected in three locations in Canada over the years 2014 to 2015. One of their goals was to compare the accuracy of the GPTransformer model to existing genomic prediction methods such as decision tree algorithms (DT), linear regression (LReg), and traditional statistical algorithms like best linear unbiased prediction (BLUP). The authors used the Pearson correlation coefficient (PCC) as a measure of performance, which calculates the linear relation between the true output and the predicted output. They showed that the GPTransformer model (and all of the used ML models) did not significantly outperform the statistical method of BLUP in terms of predictive accuracy. However, GPTransformer did perform better than both the DT and LReg methods. The authors note that the ML methods used are able to capture non-additive genetic elements, and as such, the predictions provided might include some of these interactions in their estimations.
4.2.2. NNs Applied to Spectral Data
Hyperspectral (or just spectral) data refer to the capture and processing of information from across the electromagnetic spectrum [
61]. Refs. [
43,
62,
63] applied NN classification algorithms to pixels of hyperspectral image data to examine wheat for Fusarium head blight infection. Each author used a convolutional NN (CNN), which captures spatial patterns or motifs by identifying and calculating weights from the images according to how often the motif appears.
In Ref. [
43], the authors investigated four distinct methods for converting hyperspectral imaging data. They then evaluated the performance of eight different CNN models in classifying pixels as either healthy or infected with Fusarium head blight. The effectivenesses of these models were compared based on their classification accuracy. They found that a particular type of CNN called
DarkNet 19 [
64] performed the best, with an accuracy of close to 100% across all data conversion methods, on both the validation and test data. For Ref. [
63], tests showed that the CNN model is effective in detecting images that contain the blight and achieved an
value of 0.80, and the mean average accuracy for the testing dataset was 92%. In Ref. [
62], the authors compared the accuracies of the different NNs to determine which is the best at identifying diseased regions of the wheat kernel. They showed that a two-dimensional convolutional bidirectional gated recurrent unit NN performed the best, with an accuracy of 84.6% on the validation dataset and an F1 score and accuracy of 0.75 and 74.3%, respectively, on the test data.
Ref. [
49] used a combination of hyperspectral data and NNs to detect aflatoxin in peanuts. They showed the CNN’s efficacy in classifying infected peanuts and achieved a test set accuracy of 95%. They later expanded their work and used a one-dimensional CNN (1D-CNN) to classify aflatoxin infection in corn and peanuts. This time, they achieved accuracies of 96.4% for peanuts and 92.1% for corn [
65].
In a research conducted by [
66], infrared (IR) spectroscopy and ML algorithms were used to detect fungal contamination in corn. In their study, 183 naturally infected samples (contaminated with different
Fusarium DON species and at different concentrations) were obtained from the seed production Linz of Austria (SBL) and from the Cereal Research Centre of Hungary (CRC). The authors assessed several classification ML models, including multi-layer perceptron (MLP) neural networks, random forests, support vector machines, and adaptive boosting, for their accuracy in correctly classifying contaminated from non-contaminated samples. Their results showed that the MLP approach correctly classified 94% of the non-contaminated samples and 91% of the contaminated samples. The authors note that while this approach yields promising results, these findings are specific to a contamination threshold of 1250 mg/kg, which is the EU regulatory limit, and that subsequent research will aim to evaluate the performance of the classification methods across various contamination levels.
4.2.3. NNs with an Electronic Nose
An electronic nose (e-nose) is a device intended to detect chemical compounds in gasses. E-noses have been extensively used in the detection of aflatoxins [
67,
68], fumonisins [
69], and DON [
70] in corn. However, Ref. [
71] used an e-nose supported by NNs for the detection of aflatoxin and fumonisins in corn. In their work, they compared three different approaches, that is, NN, logistic regression (LR), and discriminant analysis (DA), to examine the e-nose’s ability to discriminate between samples contaminated with concentrations either exceeding or falling below legal thresholds on data spanning 5 years. They showed that all methodologies achieve an accuracy of above 70%, with the NN performing the best with an accuracy of 78% for aflatoxin detection and 77% for fumonisin detection. They went on to suggest that the e-nose, when supported by an NN, can provide a fast screening tool for classifying samples.
4.2.4. NN Summary
Neural Networks have been widely adopted as the ML algorithm of choice for analysing mycotoxin data, especially in the field of hyperspectral imaging. However, as of yet, there seems to be a gap between research applications and the wider use in industry. The application of NNs in hyperspectral data for mycotoxin detection (and food safety in general) is a relatively new process, and the implementation of an NN approach to hyperspectral data in industrial quality control faces various challenges, mainly due to hardware limitations, such as the cost of operating imaging equipment [
72]. However, in research, NNs for use in hyperspectral imaging have seen an increase in popularity with many of the reviewed works being widely cited, for example, Refs. [
62,
63].
4.3. Random Forests
A random forest (RF) [
73] is an ensemble learning method used for classification and regression. The RF algorithm creates a
forest of decision trees, where each tree in the forest is built from a sample drawn with replacement (that is, a bootstrap sample) from the training set and selects splits from a random subset of features.
While
Section 4.6.1 provides a comprehensive examination of decision trees, this section offers a concise introduction to familiarise readers with the basic concepts and terminologies associated with decision trees.
Figure 5 shows an example of a single decision tree. In constructing each decision tree, the root node is the starting point, and it represents the entire dataset, which gets split based on a feature that provides the best separation according to a certain criterion [like Gini impurity [
74]]. The decision nodes are the points where the data are split further. Each decision node represents a decision rule on a specific feature. The process continues recursively until a stopping criterion is met, such as reaching the tree’s maximum depth, attaining a minimum sample count in a leaf, or achieving adequate purity within the leaf nodes. The leaf/terminal nodes represent the final output of the decision process. Each branch/sub-tree represents a possible outcome of the decision made at the decision node, leading to further sub-trees or leaf nodes.
For RF classification tasks, each tree in the forest votes for a class, and the class receiving the majority of votes becomes the model’s prediction. For regression tasks, the forest takes the average of the outputs by individual trees.
Figure 6 shows a summary of the RF algorithm.
One of the main advantages of using RFs is their versatility. They are capable of performing both regression and classification tasks, as well as handling large datasets. Additionally, they require very little tuning and can perform well without much hyperparameter optimisation. Some of the main hyperparameters associated with RF include the following: (i) Number of trees: this is the number of trees in the forest. Generally, more trees increase performance but also increase the computational cost. (ii) Maximum depth of trees: the maximum depth of each tree. Deeper trees can model more complex patterns but might lead to overfitting. (iii) Minimum samples split: the smallest number of samples needed to split an internal node. Setting higher values helps prevent the model from learning overly specific patterns, which can lead to overfitting. As with NNs, RFs are a black-box algorithm, and so interpretability can be an issue. Each decision tree upon which the RF is built can be easy to interpret, but since RFs consist of a large number of decision trees averaged together, the decision process by which a prediction is made can be somewhat opaque.
4.3.1. RFs for Spectral Data
As with NNs, RFs have been applied to hyperspectral data. For example, Ref. [
75] used a RF classification model to classify corn silage for high or low mycotoxin contamination using near-infrared spectroscopy (NIR). In their study, 155 samples were collected from several sites in the Po Valley (Italy) and from Sardinia over the years 2017 to 2019. Their aim was to develop qualitative models capable of distinguishing corn silage based on either the total concentrations or the total counts of various groups of mycotoxins (in this case,
Fusarium and
Penicillium toxins). To evaluate various classification strategies, different distinct threshold levels were established for each mycotoxin contamination. These thresholds were used to categorise each sample as having either a high or low contamination level in relation to these specified values. To predict the contamination level, an RF classification model was fitted, using the wavelength of light as the predictors, and achieved an out-of-sample accuracy of above 90% for the classification of both Fusarium and penicillium toxins.
In a 2023 study, Ref. [
76] utilised NIR spectroscopy for detecting DON in oat samples from Spain and Sweden collected over the years 2021–2022. The authors applied two different transformation techniques to the spectral data and examined which allowed for greater classification of the data using four different ML algorithms (k-nearest neighbours, naïve Bayes, NN, and RF). Both preprocessing transformation methods achieved similar results for all ML methods, with RFs performing the best with an accuracy of 77.8% and an area under the curve (AUC) of around 0.77. However, they noted that other similar studies have been conducted that achieved a higher classification accuracy, such as [
77].
In a similar study, Ref. [
78] constructed a biosensor array for identifying mycotoxins in peanuts and corn, produced by Aspergillus flavus, using six ML models, including partial least square determination analysis (sPLS-DA), linear support vector machine (svmLinear), radial support vector machine (svmRadial), RF, NN, and high-dimensional discriminant analysis (HDDA). The authors used the classification models for three separate purposes: to distinguish healthy from infected samples, to distinguish the pre-mould status in infected samples, and to distinguish between infected peanuts or corn samples. To distinguish the pre-mould status, the aim was to create a three-class model to predict either the control or 1 or 2 days after inoculation. Their approach achieved a reported 100% accuracy in distinguishing healthy from infected samples and RF accuracies of 95% and 98% in identifying pre-mould status in peanuts and corn, respectively. However, such high levels of accuracy warrant further investigation, as such high accuracy rates can often be indicative of issues in the experimental design, such as the creation of non-representative test sets or overfitting, especially if the test sets are not properly randomised.
4.3.2. RFs for Mycotoxin Treatment
ML models in mycotoxin treatment can be used to predict mycotoxin contamination risk and optimise mitigation strategies. This application can boost accuracy in prediction and effectiveness in deploying targeted anti-fungal treatments. In a study conducted by [
79], the authors employed machine learning techniques to predict the growth of
Fusarium culmorum and
Fusarium proliferatum, as well as their production of mycotoxins, in environments where ethylene vinyl alcohol copolymer films are used. These films contain pure components of essential oils, which are used to inhibit the growth of the fungi and their mycotoxin production. In their work, they studied fungal growth on corn in vitro and modelled the fungal growth and toxin production under different environmental scenarios and with different treatments applied. The ML models used were NNs, RF, extreme gradient boosted trees (XGB), and multiple linear regression (MLR). The performance of the ML methods was assessed using the root mean square error (RMSE). It was found that RF performed the best in predicting the growth rates of
Fusarium culmorum and
Fusarium proliferatum and mycotoxin production, having consistently the lowest RMSE value.
Ref. [
80] evaluated the anti-fungal properties of specific lactic acid bacteria strains against
Fusarium species found in cereals. To achieve this, various machine learning algorithms, including NN, RF, XGB, and MLR, were employed to predict the extent of fungal growth inhibition resulting from the application of the tested lactic acid bacteria strains. As with the previous study, the RMSE was the metric used to assess the performance of the model, in conjunction with the
value. In this work, both RF and XGB showed comparable performances, reporting similar RMSE (0.0604 and 0.0581, respectively) and
values (0.992 and 0.992, respectively) on the test data, in predicting the percentage of growth inhibition.
Several other studies exist on the topic of using ML models (and specifically RF) to predict mycotoxin growth in the presence of treatments. In the interest of brevity and space, we name them here but do not provide additional details of the studies. In each of these studies, the authors used multiple ML models, with a general consensus that RF models performed the best at their given tasks. See Refs. [
81,
82,
83] for more details.
4.3.3. Random Forest Summary
RFs have emerged as a robust and versatile tool in the field of mycotoxin detection and treatment and have gained popularity due to their ease of use, computational speed, and predictive performance. These studies collectively underline the significant potential of RF in enhancing food safety measures, although it is crucial to acknowledge the necessity for rigorous validation and testing to ensure the reliability of these models.
4.4. Gradient Boosting
Gradient boosting (GB) [
84] builds on the concept of boosting, where weak learners are converted into strong ones through an iterative process. The GB framework builds boosted regression models by sequentially training a weak classifier (such as a linear regression or simple decision tree) successively on the data using the residuals from previous model fits (as shown in
Figure 7). This process ensures that each new weak classifier addresses the inaccuracies of its predecessors, thereby enhancing the prediction accuracy. The final model aggregates the outputs from all these weak classifiers to form a robust, ‘strong’ classifier through an ensemble approach. The term
gradient in gradient boosting refers to the method’s use of gradient descent, a numerical optimisation algorithm, to minimise the loss or the difference between the actual and predicted values.
In gradient boosting, when the weak learners are decision trees, each tree is grown in a greedy manner, but unlike random forests, trees are grown sequentially. After the first tree is built and predictions are made, the errors (residuals) from those predictions are used to build the next tree. The subsequent tree aims to predict the residuals from the previous tree. This process is continued, with each new tree correcting the residuals of the ensemble of all previous trees. The final prediction is made by summing the predictions from all trees, which can be thought of as a weighted vote where trees that reduce the error the most have more influence.
An advantage of GB models is their strong predictive capability and adaptability, especially in dealing with complex non-linear relationships between independent variables and the dependent variable. They adapt to various prediction problems by supporting different loss functions, making them suitable for both regression and classification tasks. However, these models have their challenges. Without careful tuning and regularisation, there is a risk of overfitting, a problem exacerbated by noisy data [
85]. Additionally, their sequential boosting process is computationally intensive and time-consuming compared with methods like random forests that build trees in parallel. This complexity can be a significant drawback in scenarios where computational resources or time are limited. Some of the main hyperparameters associated with GB are as follows: (i) Number of weak learners: this defines the number of boosting stages or learners to be created. More learners can lead to a more powerful model, but also increase the risk of overfitting and raise computational cost. (ii) Learning rate: this parameter scales the contribution of each learner. A smaller learning rate requires more weak learners but can yield a more generalised model. In the case of the weak learner being trees, (iii) the maximum depth of trees determines the maximum depth of each individual tree. Deeper trees can model more complex patterns but can also lead to overfitting. An extension of a GBM model is called eXtreme Gradient Boosting (XGB) [
86], with the key difference between the two being performance. In general, XGB models are faster and have better optimisation. Additionally, XGB models have the ability to deal with missing values.
4.4.1. GB for Spatiotemporal Data
In a study by [
87], the authors designed a program for aflatoxin monitoring in feed products (peanuts and soy beans), while considering both the performance of the model and the cost of monitoring. In the study, they applied four different ML algorithms (namely, GB, LR, SVM, and DT) to historical data concerning monitoring for the presence of aflatoxins in feed products. The data were collected from several sites around the world, including China, Brazil, and Argentina, over the years 2005 to 2018. The ML algorithms were compared to predict which feed batches are high risk and which should be considered for further aflatoxin analysis. In their work, they found that all the ML models performed well and used several error metrics to assess their models. They obtained an accuracy of over 90% for all models and an AUC and recall of over 0.8 and 0.6, respectively. However, the XGB model performed better than all other models, and the authors proposed a reduction to the monitoring cost of up to 96% for the years 2016 to 2018.
In Ref. [
88], the authors proposed to use un-targeted metabolomics and ML techniques to mine biomarkers of the species
Aspergillus on peanut data collected from several sites in China over the years 2013 to 2018. They initially used an RF model to determine
Aspergillus species with 97.8% accuracy. They then went on to use XGB to create a decision rule to help regulators in evaluating risk prioritisation with a claimed accuracy of 87.2%. However, the authors noted that they built the XGB model using only a single tree and used this tree to create an operable decision workflow for risk assessment. Although using a single tree can reduce complexity, it also increases the likelihood of less robust predictions. Part of the strength of XGB (and GBM) models is that they iteratively correct the mistakes of previous trees, a process that is lost if only a single tree is used.
Ref. [
39] conducted a study with the objective of evaluating the performance of GBM models to predict the presence of aflatoxins in corn at two risk thresholds, that is, 20 ppb and 5 ppb. These cut-off values were chosen based on the U.S. Food and Drug Administration’s (FDA) action level for corn (20 ppb) [
89], whereas the lower cut off is based on the European standard of 5 ppb [
90]. Additionally, the authors performed feature engineering, which is the process of transforming raw data into meaningful and informative features with the intention of enhancing the performance of ML algorithms [
91]. The data used were historical climate, soil, and aflatoxin data, collected in several sites in Iowa in the years 2010, 2011, 2012, and 2021. As the data had many missing values, the authors used an imputation method; however, they noted that data from the months of January, February, and December had to be excluded from the model as there were too many missing values to accurately impute the data. The authors reported that the GBM model performed well, achieving high accuracy rates of 96.8% for the 20 ppb threshold and 90.3% for the 5 ppb threshold. The study highlighted the significant influence of the vegetative index (which is a quantitative measure that uses satellite imagery to assess the amount and health of plant life in a specific area) in August on aflatoxins risk for both thresholds, indicating the critical environmental and ecological impact of drought conditions during this month. Additionally, predictors related to soil properties (such as hydraulic conductivity, pH, and bulk density) were found to potentially affect aflatoxin contamination levels before harvest.
4.4.2. GB for Spectral Data
Ref. [
92] conducted a study on aflatoxin and fumonisin contamination in a single kernel corn. They argued that bulk sampling of the corn may not produce accurate results, and thus focus solely on single kernels. In their study, they performed measurements to show the skewness of the data and calculated weighted sums of toxin contamination. Additionally, they aimed to improve single kernel classification performance through the use of different ML applications. Their methodology was to take corn kernels that were already contaminated and scan them using the NIR technique. The samples were then ground and measured for both toxins using the ELISA method (discussed in
Section 1). In their work, they used five different ML models to classify both mycotoxins. They are GBM, RF, least absolute shrinkage and selection operator (LASSO), elastic-net regularised generalised linear models (GLMNETs), and support vector machines (SVMs). They additionally applied ML algorithms for classifying each individual mycotoxin. For aflatoxin, they used bagged AdaBoost, linear discriminant analysis (LDA), and penalised logistic regression (PLR). For fumonisin classification, GBM and penalised discriminant analysis (PDA) were used. For aflatoxin, they found that GBM was the best-performing model, with an accuracy of 83%, on both the training and the test data. For fumonisin, the PDA model performed the best with an accuracy of 86% on the test data. However, the authors noted that, for future studies, opportunities for better classification exist, including increasing the proportion of samples so the algorithm can learn the characteristics of contaminated corn kernels better.
4.4.3. Gradient Boosting Summary
The application of GBM models across various datasets, from spatiotemporal to spectral data, demonstrate their versatility and potential in predicting mycotoxin contamination in agricultural products. While GBM models generally exhibit high accuracy, there are criticisms concerning the robustness of these models when applied with limited trees, as in the case of [
88], or when handling datasets with substantial missing values, as noted by [
39]. The high accuracy rates reported should be examined for potential overfitting or lack of generalisation to broader datasets. The approach of ref. [
92] to single kernel analysis opens avenues for improved precision in toxin detection, but also indicates the need for larger sample sizes to enhance model learning.
4.5. Support Vector Machines
Support vector machines (SVMs) [
93] are a set of supervised learning methods used for classification, regression, and outlier detection. To make predictions, SVMs identify the optimal hyperplane that maximises the margin between the two classes (where the margin is defined as the distance between the nearest data points of each class and the dividing hyperplane). The data points that are closest to the hyperplane and that influence its position and orientation are known as support vectors, as they
support or define the hyperplane.
Figure 8 illustrates an SVM in action. One of the key advantages of SVMs is their versatility as they can be used on a variety of data types, and are particularly useful for image recognition [
94]. Additionally, they are memory efficient since they only use a subset of training points, called support vectors, in the decision function. However, SVMs require careful tuning of the hyperparameters and an appropriate kernel choice. A kernel is a function used to transform data into a higher-dimensional space. By projecting the data into a higher dimension, a kernel makes it possible to find a hyperplane that can effectively separate the classes. Some of the common kernels include [
95]:
Linear: No non-linear transformation, suitable for linearly separable data.
Polynomial: Suitable for non-linearly separable data, involves higher degree terms of the features.
Radial basis function: Good for non-linear data, uses a Gaussian distribution.
Sigmoid: Similar to the sigmoid function in logistic regression.
Additional hyperparameters include the following: (i) Gamma: This is needed for all kernels except linear. It determines the extent of the influence that a single training example has. Low values indicate a wide reach, and high values indicate a close reach. A high gamma value can cause the model to overfit. (ii) Degree: This is only relevant for a polynomial kernel. It defines the degree of the polynomial used in the kernel. A higher degree can model more complex relationships but increases the risk of overfitting. (iii) Coef0: This is a parameter for polynomial and sigmoid kernels that adjusts the independent term in the kernel function. It is often called the kernel bias.
4.5.1. SVMs for Spectral Data
In the review of the literature concerning the use of SVMs in mycotoxin detection, it was found that they were overwhelmingly used for image recognition and, as such, primarily used spectral data. For example, ref. [
45] used several ML models (SVM, NN, and LR) for the classification of Fusarium head blight in wheat, using spectral data. The data were collected in the years 2020 to 2021 at a single site in Belgium, with the experiment using eight varieties of wheat. They found that the SVM model outperformed both the NN and LR method in classifying contaminated wheat in every variety, with a classification accuracy of 96.5% on the test data (with NN and LR achieving accuracies of 82.9% and 82.5%, respectively).
In a similar study, Ref. [
96] used three different imaging methods alongside ML classification models to test ground corn samples for the presence of aflatoxin and fumonisin, both as individual contaminants and in combination. Two classification models were used, partial least squares-discriminant analysis (PLS-DA) and SVM, using specific threshold values for each mycotoxin. The naturally contaminated corn samples were obtained from the Office of Texas State Chemist, which in turn collected the samples from different feed companies located around Texas. They found that the SVM performed better than the PLS-DA with classification accuracies of 89.1%, 71.7%, and 95.7% for each imaging technique. The imaging method with the highest accuracy was the short-wave infrared (SWIR) method.
In a study concerning the detection of
Aspergillus parasiticus in corn kernels using NIR hyperspectral imaging, conducted by [
97], the authors used SVMs to compare the performances of multiple different preprocessing and imaging techniques. For their study, corn kernels were harvested from Hefei City, Anhui Province, China, in 2015. Each day (for a period 7 days), 36 sterilised corn kernels were inoculated with
Aspergillus parasiticus and were grouped into four groups depending on the day of inoculation. From this, an SVM was used to determine which groups were infected using different preprocessing techniques. Additionally, this study examined the orientation of the kernel in the image to determine if this property had an effect on predictive performance. They found that the best preprocessing method was a combination of the standard normal variate (SNV) and moving average smoothing (MAS) methods, with an accuracy of 91.67% for detecting contaminated kernels using the validation data. They also found that the performance of the classified models was influenced by orientation; however, the models built using data from a mix of kernels with their germs facing both up and down still achieved an accuracy of 84.38% on the validation data.
4.5.2. Support Vector Machine Summary
In the reviewed work, SVMs demonstrated considerable accuracy in mycotoxin detection through spectral data analysis. However, as with other ML methods reviewed, the consistently high classification accuracy reported raises questions about potential overfitting and the representativeness of the datasets used. Moreover, factors such as kernel orientation (which refers to the way in which the kernel function transforms the input data into a higher-dimensional space to find an optimal boundary between classes) significantly influenced SVM performance, indicating that model robustness may be context dependent. The choice of kernel and its parameters, like orientation, scale, and type, is critical in shaping the decision surface and, thus, the SVM’s ability to generalise from training to unseen data.
4.6. Other ML Methods
In this section, we cover the remaining ML methods. These include decision trees and Bayesian networks and have been grouped together as they make up a minority of the reviewed work. As such, they are not separated by the type of data used, and all data types are discussed together.
4.6.1. Decision Trees
Decision tree (DT) learning is a type of non-parametric supervised learning algorithm used for both classification and regression tasks [
74,
98]. A DT is a flowchart-like structure, resembling a tree structure with branches representing decision paths and leaves (or terminal nodes) representing predicted outcomes (see
Figure 5 in
Section 4.3). A DT splits the data into subsets based on the value of input features. Splits are chosen to maximise the separation of the classes based on measures like Gini impurity or information gain [
74]. This process continues recursively until a stopping criterion is met, resulting in a tree where each path represents a decision pathway that leads to a predicted outcome. The advantages of decision trees include their simplicity, interpretability, and ability to handle both numerical and categorical data. However, DTs have a tendency to overfit, especially when a tree is particularly deep [
74]. This can be mitigated by pruning the tree or setting a maximum depth of the tree via the use of hyperparameters. As this method is a tree-based approach, there is an overlap with RF and GB in terms of hyperparameters. Some of these include maximum depth, minimum samples split, and minimum samples leaf (i.e., the minimum number of samples needed to be at a leaf node. Setting this parameter can ensure that each leaf node represents a reasonable number of samples, which can smooth the model, particularly for regression tasks, and prevent overfitting).
The use of DTs in the field of mycotoxin detection is quite varied. For example, in a study conducted by [
99], in which they assessed the use of an electronic nose to identify DON contamination of wheat samples, an extension of decision trees called Classification and Regression Trees (CART) [
74] was used to classify samples based on four thresholds of DON contamination (1750, 1250, 750, and 500
g/kg). For this study, 214 wheat samples were collected from Northern Italy during the years 2014–2015 and 2017–2018. For the threshold values of ≥1250
g/kg, the accuracy of sample classification was the highest, ranging between 88% and 92%. The lower thresholds of ≤750
g/kg were found to be the least accurate, with an accuracy of <83%. The authors proposed that the reduced sensitivity of the instrument at lower DON concentrations might explain this drop in accuracy.
Ref. [
99] examined the classification of DON mycotoxin-contaminated corn and peanuts at regulatory limits using spectral data. The spectral data were analysed using a bootstrap-aggregated (bagged) DT approach, focusing on the protein and carbohydrate absorption bands of the spectrum. The corn samples were obtained by Saatbau Linz (Linz, Austria) and the Cereal Research Centre (Szeged, Hungary). For the peanuts, 92 different infected samples were purchased from public markets in Tanzania, Mozambique, and Burkina Faso. The authors demonstrated that the DT method could classify corn samples at the 1750 and 500
g/kg thresholds for DON with accuracies of 79% and 85%, respectively. Additionally, it was able to classify peanut samples for aflatoxin at 8
g/kg with a 77% accuracy.
In a study related to identifying and predicting risks related to the presence of fumonisins in breakfast cereal products, Ref. [
100] developed a model specifically designed to predict the risk of fumonisin contamination, with a particular emphasis on a mixture of ingredients. In their research, fifty-eight distinct breakfast products were purchased from local grocery stores in Florence, Italy, during 2019. The selection criteria for purchasing breakfast products included (i) products with packaging sizes ranging from 200 to 500 g, including both plastic and non-plastic materials; (ii) items sourced from retail shops; and (iii) products primarily made of wheat, corn, dry fruits, rice, and oats. Principal component analysis (PCA) and k-means clustering were employed to explore the connection between cereal ingredients, their composition and packaging, and the concentration of fumonisins. The findings suggested that the fumonisin concentration might be linked to complex non-linear interactions among various factor variables. To explore this potential and identify the factors most closely linked with high concentrations, DTs were employed. Two decision trees (DTs) were developed, with the first indicating a relationship between high concentrations of fumonisins and cereal products rich in corn, particularly when combined with high levels of sodium or rice. The second tree highlighted a link between corn and either high sodium or high-fat concentrations. In both models, the presence of plastic packaging appeared to mitigate the concentration of fumonisins to a certain degree.
4.6.2. Bayesian Network
Bayesian networks (BN) are a type of probabilistic graphical model that uses Bayesian statistics to represent and infer the conditional dependencies between different variables in a dataset [
101]. The networks are structured as a directed acyclic graph (DAG), with feature nodes representing variables and edges indicating probabilistic relationships between them. Predictions in BNs are made through a process called probabilistic inference, which involves calculating the likelihood of certain outcomes based on known information and the network’s structure. In contrast with linear regression models, BN models excel at analysing variable dependencies, handling non-linear interactions, and incorporating diverse types of data [
102]. The strengths of BN include the handling of uncertainty, the integration of prior knowledge with observed data (thereby enhancing the model’s predictive capabilities), and interpretability. However, some disadvantages of using BNs exist. As the number of variables increases, the complexity of the network and the computational resources required for inference can grow exponentially.
In a study aimed for predicting DON contamination in wheat, ref. [
103] compared three different modelling approaches. These are a mixed effect LR method, a mechanistic model (which simulates the mechanisms of plant and fungus development stages and their interactions) adapted to the current data, and a BN. These were all used to predict DON contamination. The data used were collected in the Netherlands over the years 2001 to 2013. The results of the experiments showed that all three models performed well, with the LR method performing the best, achieving an accuracy of 88% for detecting DON contamination. However, the authors noted that this model is greatly reliant on both the specific location and the available data, and it requires that all input data be present. The mechanistic model achieved an accuracy of 80%, while the BN achieved an 86% accuracy. However, the authors noted that the BN is easier to implement when the data are incomplete, when compared with the other methods.
Ref. [
104] constructed transcriptional regulatory networks (TRNs) using a BN algorithm called the
module network algorithm. TRNs are complex systems in biology that describe the relationships and interactions between various proteins and genes involved in the process of
transcription [
105], where transcription is the process by which the information encoded in a section of the DNA is transcribed to produce a complementary RNA strand. The goal of their work was to understand how specific gene groups (modules) in the fungus
Fusarium graminearum regulate biological processes. The authors reported that their network inference is of high credibility, with 81.8% of the evaluable modules classified as high or moderate confidence based on their validation against a variety of evidence sources. This suggests a robust alignment of the inferred network with the existing understanding of the biological processes within
Fusarium graminearum.
4.6.3. Summary of Other ML Methods
Decision trees have shown varying degrees of effectiveness in detecting mycotoxins, as evidenced by diverse research outcomes. The use of CART to classify contaminated wheat samples achieved higher accuracy at certain thresholds but showed diminished performance at lower contamination levels. A bagged DT approach showed moderate success, suggesting that while DTs are capable classifiers, their accuracy can vary significantly based on the mycotoxin levels and sample types. The application of these methods includes potential issues with model sensitivity, particularly at lower toxin concentrations, and a reliance on the quality of the data. These factors underscore the need for a careful calibration and validation of DTs in diverse settings for reliable mycotoxin detection.
BNs have shown effectiveness in mycotoxin detection, as demonstrated in various studies, but with some limitations. Ref. [
103] compared BNs with other models for predicting DON contamination in wheat, achieving a respectable 86% accuracy. However, they highlighted BNs’ advantage in handling incomplete data, a significant benefit over other methods like logistic regression. The reviewed applications show BNs’ flexibility and efficiency, though their performance can be contingent on data completeness and specific biological contexts, which may limit their broader applicability.
4.7. Summary and Comparison of Case Studies
To provide a comprehensive overview of the specific case studies discussed, here, we include a summary table in
Table 1 that highlights the key findings by describing the data types, ML models used, application contexts, and reported accuracies. In cases where more than one ML model is used, the highest-performing model is reported in the accuracy column.
Examining
Table 1, we can see that the most frequently used ML model in the reviewed studies is the neural network, with convolutional neural networks also being highly prevalent. The most common data type used is spatiotemporal data, followed by hyperspectral data. The research covers a range of crops, including corn, wheat, barley, peanuts, and oats, with a primary focus on detecting contaminants such as aflatoxin, fumonisins, and Fusarium head blight. However, the most commonly studied crop is corn. Many studies achieved high accuracy rates, often above 90%, showcasing the potential of ML models to enhance mycotoxin detection in agriculture. However, it is important to consider that these high accuracies may be influenced by the controlled environments of individual laboratories, which can lead to overfitting and potentially less reliable performance in real-world applications (see
Section 5 for a discussion on this).
5. Conclusions
Our research focuses on highlighting and evaluating different ML models for monitoring and predicting the presence of mycotoxins in common crops. We conducted an extensive literature review of over 30 studies performed within the years 2013 to 2023. The number of publications in each field has grown significantly over the 10 years reviewed; however, the application of ML in the area of monitoring and predicting mycotoxins is still in its infancy, and despite the promise of ML methods in mycotoxin detection, their adoption in industry has been cautious. This is likely due to the high operational costs associated with advanced techniques like hyperspectral imaging, as opposed to the use of ML methods themselves. The prevalence of such data-intensive methods raises questions about the feasibility of widespread implementation, particularly in resource-constrained settings.
We found that the most common data type was spectral or image data, and as such, the most common ML method used was NNs, as they can be readily applied to image data. RFs were the second most popular ML method and have gained traction due to their robustness and ease of implementation. Additionally, most of the studies reviewed used classification ML techniques to distinguish contaminated from healthy crops. The high predictive accuracy reported in the reviewed studies suggests that these methods represent a promising approach for mycotoxin detection and enhancing food safety in general. However, a point to note is that the reported high accuracy of the ML model’s predictions, often exceeding 90%, may not fully account for the homogeneity of training and test sets within individual laboratories. This homogeneity can result in overfitting, where models appear highly accurate in a controlled setting but may not perform as well under the variable conditions of real-world applications.
Although this work focused on the application of the most popular ML methods, numerous other ML and statistical techniques have been applied to mycotoxin detection data. For example, in a study by [
106], classification models such as partial least squares-discriminant analysis (PLS-DA) and principal component-linear discriminant analysis (PC-LDA) were employed to distinguish between wheat samples with high and low contamination. Additionally, statistical techniques like PCA are often used as a dimension reduction method. Refs. [
107,
108,
109] used PCA when dealing with high-dimensional image data.
A critical bottleneck in the development of ML applications for food safety is the lack of detailed hyperparameter descriptions, which further complicates the landscape, as these parameters are crucial for the replication and validation of ML models. Without clear reporting on hyperparameter tuning, the ability to reproduce results and validate findings becomes challenging, hindering the progression towards robust and reliable ML applications in food safety. The majority of the reviewed studies do not provide open access to code, and many have limited access to data, further impeding the reproducibility of the described methods.
Despite these challenges, the future prospects of ML in food safety are promising. As the field matures, there is a need for standardisation in reporting practices and for developing models that can reliably perform across diverse laboratory conditions and datasets. Extensive research could be conducted that directly compares different ML models under a standardised set of hyperparameters, providing clearer insights into the most effective techniques in specific contexts related to mycotoxin detection.
As the field is growing, there are numerous avenues for future work. One such avenue is model interpretability. Given the critical nature of food safety, future research could also focus on improving the interpretability of ML models. Techniques like SHAP (SHapley Additive exPlanations) [
110] and LIME (Local Interpretable Model-Agnostic Explanations) [
111] can be used to make the models’ decisions more transparent and trustworthy. Furthermore, addressing the current bottlenecks, such as the high operational costs and the need for data standardisation, will be crucial. Future research should explore cost-effective techniques and advocate for open-access datasets and standardised reporting practices to enhance reproducibility and application in diverse settings.