1. Introduction
China was the first country in the world to recognize and cultivate the edible mushroom, and it is now the world’s largest producer, consumer, and exporter of this type of crop. In recent years, China’s edible mushroom sector has consistently maintained a growth trend, and edible mushrooms have emerged as China’s “fifth-largest crop” after grain, oil, vegetables, and fruits in terms of agricultural production [
1].
Oudemansiella raphanipes is a well-known edible mushroom renowned for its rich nutritional content, which comprises proteins, amino acids, vitamins, and trace elements. It can also serve as a medicine, containing polyphenols, polysaccharides, flavonoids, and various other bioactive compounds that are beneficial for enhancing immunity, regulating body functions, repairing damaged organs, and providing analgesia and anti-inflammatory effects [
2,
3]. The increasing awareness of consumers concerning quality leads to significant price variations among different grades. However, traditional methods of artificial quality grading are time-consuming and labor-intensive, significantly limiting the economic benefits. Therefore, there is an urgent need for an algorithm capable of automatically assessing and grading the quality of
Oudemansiella raphanipes.
With advancements in deep learning, convolutional neural networks (CNNs) have emerged as powerful tools for extracting high-dimensional features from objects for image classification [
4]. In the context of mushroom quality grading, several studies have demonstrated their effectiveness. Tongkai Li et al. discussed a quality grading algorithm for
Oudemansiella raphanipes using transfer learning and MobileNetV2, exhibiting a test accuracy of 98.75% [
5]. Yanqiang Wu et al. proposed a size-grading method of antler mushrooms using YOLOv5 and PSPNet, achieving a detection accuracy of 94% [
6]. Yinhua Zuo et al. introduced a quality grading model for
Pleurotus cristatus based on an improved EfficientNet with a self-attention module, resulting in an average recognition accuracy of 91.5% [
7]. Li Wang et al. proposed a grade recognition method for dried shiitake mushrooms using an improved VGG network (D-VGG), achieving a high classification accuracy of 96.21% [
8]. Lei Shi et al. presented a lightweight grading detection method for Oyster mushrooms using OMC-YOLO, improved based on YOLOv8n, with an mAP50 value of 94.95% [
9]. Ziyuan Wei et al. employed a convolutional autoencoder–support vector machine model for the quality classification of
Pleurotus eryngii, achieving an accuracy of 91.58% [
10].
However, it is essential to develop a model that balances classification accuracy, resource occupation, and response speed, especially for resource-limited devices. Although large-parameter models like VGGNet [
11] and ResNet [
12] offer high accuracy, they fall short in terms of resource efficiency and response speed. Conversely, small-parameter models such as Xception [
13] and ShuffleNet [
14,
15] have smaller resource occupation and faster calculation speeds but may lack satisfactory classification accuracy. To address this challenge, knowledge distillation (KD) algorithms can be employed to enhance the accuracy of small-parameter models while maintaining their speed advantages [
16,
17]. KD is a technique in deep learning, aimed at transferring knowledge from large-parameter models to small-parameter models. The large-parameter model, typically characterized by high accuracy but significant resource occupation and slower response speed, is used as the teacher model, whereas the small-parameter model, typically featuring low resource occupation and fast response speed but lower accuracy, is used as the student model. Without KD, the student model learns knowledge solely from the ground truth (referred to as hard labels). With KD, in addition to the ground truth, the student model also learns knowledge from the teacher model, whose softmax output (referred to as soft labels) is utilized to train the student model. Consequently, the small-parameter student model can achieve similar accuracy to the large-parameter teacher model while maintaining low resource occupation and fast response speed. This approach was first proposed to utilize a single-teacher model [
16,
17]. Since multiple teachers can facilitate the transfer of a broader range of knowledge and mitigate the impact of a single low-quality teacher, multi-teacher KD has been proposed [
18,
19,
20,
21,
22,
23,
24,
25,
26]. In this context, several teacher models are combined together with certain weights to form an ensemble model. As a result, the weights assigned to each teacher model directly influence the performance of the distilled student model. A simple scheme is to adopt equal weights for each teacher model [
18]. However, equal weights fail to distinguish between high-quality and low-quality teacher models, potentially resulting in unsatisfactory performance. A variety of methods have been developed to adjust the weights, such as manually tuning weights [
19] and adaptive weights [
20,
21,
22,
23,
24]. To eliminate the necessity of selecting weights, there are other methods to select teacher models, including a voting strategy to choose the majority side [
18], randomly selecting a teacher model [
25], and applying reinforcement learning to filter out inappropriate teacher models [
26]. However, some teacher models were discarded and consequently went to waste. The challenge lies in fully utilizing all three teacher models while mitigating the adverse effects posed by low-quality teacher models, which remains an unresolved issue.
In this paper, a three-teacher KD algorithm utilizing cascaded teacher models is proposed for quality grading of Oudemansiella raphanipes, with the aim of improving the grading accuracy while maintaining a low resource occupation and fast response speed. The main contributions of this paper are as follows:
(1) Three cascaded structures, including the parallel model, the standard series model, and the series model modified with residual connections (hereafter denoted as the residual-series model), were investigated.
(2) Compared with the student model distilled with a single teacher or an ensemble three-teacher model with equal weights (hereafter denoted as the equal-weight three-teacher model), these cascaded structures exhibit improved performance indices. In particular, the residual-series model outperforms the other models, achieving the highest grading accuracy.
(3) The superiority of the residual-series model is also demonstrated by comparison with recently published model compression techniques, through its ability to transfer to edge devices with limited computing resources, and by its generalization ability on the public mushroom dataset.