Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data
Abstract
:1. Introduction
- We are the first to address the challenge of imbalanced data distribution in hyperspectral imaging-based nondestructive testing by incorporating self-supervised learning and Dice loss.
- We evaluate our approach on three imbalanced datasets, finding that it outperforms alternatives even in scenarios with extremely limited availability of minority samples.
- Our study reveals that self-supervised learning is key to realizing improved performance on imbalanced datasets. Additionally, combining self-supervised learning with the Dice loss further enhances model robustness.
2. Materials and Methods
2.1. Samples
2.2. Hyperspectral System and Acquisition of Spectra
2.3. Proposed Method
2.3.1. Prototypical Network Architecture
2.3.2. Spectral Prototypical Contrastive Learning
2.3.3. Fine-Tuning with Dice Loss
2.4. Implementation Details
2.5. Experiment Settings
2.6. Evaluation Metrics
2.7. Methods for Comparison
3. Results
3.1. Analysis of Spectra
3.2. Comparison with Baselines
4. Discussion
4.1. Contributions of the Proposed Components
4.2. Intuition of the Contributed Components
4.3. Two-Dimensional Visualization of the Proto-DS Learned Space
4.4. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shi, Q.; Guo, T.; Yin, T.; Wang, Z.; Li, C.; Sun, X.; Guo, Y.; Yuan, W. Classification of Pericarpium Citri Reticulatae of Different Ages by Using a Voltammetric Electronic Tongue System. Int. J. Electrochem. Sci. 2018, 13, 11359–11374. [Google Scholar] [CrossRef]
- Wai Lok, C.; Fang, M. HPLC-Based Chemometric Analysis for Coffee Adulteration. Foods 2020, 9, 880. [Google Scholar] [CrossRef] [PubMed]
- Kamruzzaman, M.; Sun, D.W.; ElMasry, G.; Allen, P. Fast detection and visualization of minced lamb meat adulteration using NIR hyperspectral imaging and multivariate image analysis. Talanta 2013, 103, 130–136. [Google Scholar] [CrossRef] [PubMed]
- Du, Q.; Zhu, M.; Shi, T.; Luo, X.; Gan, B.; Tang, L.; Chen, Y. Adulteration detection of corn oil, rapeseed oil and sunflower oil in camellia oil by in situ diffuse reflectance near-infrared spectroscopy and chemometrics. Food Control 2021, 121, 107577. [Google Scholar] [CrossRef]
- Wang, S.; Guo, Q.; Wang, L.; Lin, L.; Shi, H.; Cao, H.; Cao, B. Detection of honey adulteration with starch syrup by high performance liquid chromatography. Food Chem. 2015, 172, 669–674. [Google Scholar] [CrossRef]
- Kong, W.; Zhang, C.; Liu, F.; Nie, P.; He, Y. Rice Seed Cultivar Identification Using Near-Infrared Hyperspectral Imaging and Multivariate Data Analysis. Sensors 2013, 13, 8916–8927. [Google Scholar] [CrossRef]
- Ru, C.; Li, Z.; Tang, R. A Hyperspectral Imaging Approach for Classifying Geographical Origins of Rhizoma Atractylodis Macrocephalae Using the Fusion of Spectrum-Image in VNIR and SWIR Ranges (VNIR-SWIR-FuSI). Sensors 2019, 19, 2045. [Google Scholar] [CrossRef]
- Dong, Y.L.; Yan, N.; Li, X.; Zhou, X.M.; Zhou, L.; Zhang, H.J.; Chen, X.G. Rapid and sensitive determination of hydroxyproline in dairy products using micellar electrokinetic chromatography with laser-induced fluorescence detection. J. Chromatogr. A 2012, 1233, 156–160. [Google Scholar] [CrossRef]
- Manley, M. Near-infrared spectroscopy and hyperspectral imaging: Non-destructive analysis of biological materials. Chem. Soc. Rev. 2014, 43, 8200–8214. [Google Scholar] [CrossRef]
- Mabood, F.; Jabeen, F.; Hussain, J.; Al-Harrasi, A.; Hamaed, A.; Al Mashaykhi, S.A.; Al Rubaiey, Z.M.; Manzoor, S.; Khan, A.; Haq, Q.I.; et al. FT-NIRS coupled with chemometric methods as a rapid alternative tool for the detection & quantification of cow milk adulteration in camel milk samples. Vib. Spectrosc. 2017, 92, 245–250. [Google Scholar]
- de Carvalho Couto, C.; Freitas-Silva, O.; Morais Oliveira, E.M.; Sousa, C.; Casal, S. Near-Infrared Spectroscopy Applied to the Detection of Multiple Adulterants in Roasted and Ground Arabica Coffee. Foods 2021, 11, 61. [Google Scholar] [CrossRef]
- Hebling e Tavares, J.P.; da Silva Medeiros, M.L.; Barbin, D.F. Near-infrared techniques for fraud detection in dairy products: A review. J. Food Sci. 2022, 87, 1943–1960. [Google Scholar] [CrossRef] [PubMed]
- Tankeu, S.; Vermaak, I.; Chen, W.; Sandasi, M.; Viljoen, A. Differentiation between two “fang ji” herbal medicines, Stephania tetrandra and the nephrotoxic Aristolochia fangchi, using hyperspectral imaging. Phytochemistry 2016, 122, 213–222. [Google Scholar] [CrossRef]
- Sun, F.; Chen, Y.; Wang, K.Y.; Wang, S.M.; Liang, S.W. Identification of genuine and adulterated pinellia ternata by mid-infrared (MIR) and near-infrared (NIR) spectroscopy with partial least squares-discriminant analysis (PLS-DA). Anal. Lett. 2020, 53, 937–959. [Google Scholar] [CrossRef]
- Bai, Z.; Hu, X.; Tian, J.; Chen, P.; Luo, H.; Huang, D. Rapid and nondestructive detection of sorghum adulteration using optimization algorithms and hyperspectral imaging. Food Chem. 2020, 331, 127290. [Google Scholar] [CrossRef]
- Pinheiro Claro Gomes, W.; Gonçalves, L.; Barboza da Silva, C.; Melchert, W.R. Application of multispectral imaging combined with machine learning models to discriminate special and traditional green coffee. Comput. Electron. Agric. 2022, 198, 107097. [Google Scholar] [CrossRef]
- Backhaus, A.; Seiffert, U. Classification in high-dimensional spectral data: Accuracy vs. interpretability vs. model size. Neurocomputing 2014, 131, 15–22. [Google Scholar] [CrossRef]
- Feng, J.; Liu, Y.; Shi, X.; Wang, Q. Potential of hyperspectral imaging for rapid identification of true and false honeysuckle tea leaves. J. Food Meas. Charact. 2018, 12, 2184–2192. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, S.; Han, W.; Liu, W.; Qiu, Z.; Li, C. Convolutional neural network for hyperspectral data analysis and effective wavelengths selection. Anal. Chim. Acta 2019, 1086, 46–54. [Google Scholar] [CrossRef] [PubMed]
- Zheng, M.; Zhang, Y.; Gu, J.; Bai, Z.; Zhu, R. Classification and quantification of minced mutton adulteration with pork using thermal imaging and convolutional neural network. Food Control 2021, 126, 108044. [Google Scholar] [CrossRef]
- Nallan Chakravartula, S.S.; Moscetti, R.; Bedini, G.; Nardella, M.; Massantini, R. Use of convolutional neural network (CNN) combined with FT-NIR spectroscopy to predict food adulteration: A case study on coffee. Food Control 2022, 135, 108816. [Google Scholar] [CrossRef]
- Lopez, E.; Etxebarria-Elezgarai, J.; Amigo, J.M.; Seifert, A. The importance of choosing a proper validation strategy in predictive models. A tutorial with real examples. Anal. Chim. Acta 2023, 1275, 341532. [Google Scholar] [CrossRef] [PubMed]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
- Amirruddin, A.D.; Muharam, F.M.; Ismail, M.H.; Tan, N.P.; Ismail, M.F. Hyperspectral spectroscopy and imbalance data approaches for classification of oil palm’s macronutrients observed from frond 9 and 17. Comput. Electron. Agric. 2020, 178, 105768. [Google Scholar] [CrossRef]
- Amirruddin, A.D.; Muharam, F.M.; Ismail, M.H.; Tan, N.P.; Ismail, M.F. Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles. Comput. Electron. Agric. 2022, 193, 106646. [Google Scholar] [CrossRef]
- Maktabi, M.; Köhler, H.; Ivanova, M.; Jansen-Winkeln, B.; Takoh, J.P.; Niebisch, S.; Rabe, S.M.; Neumuth, T.; Gockel, I.; Chalopin, C. Tissue classification of oncologic esophageal resectates based on hyperspectral data. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 1651–1661. [Google Scholar] [CrossRef]
- Özdemir, A.; Polat, K.; Alhudhaif, A. Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods. Expert Syst. Appl. 2021, 178, 114986. [Google Scholar] [CrossRef]
- Wu, N.; Weng, S.; Chen, J.; Xiao, Q.; Zhang, C.; He, Y. Deep convolution neural network with weighted loss to detect rice seeds vigor based on hyperspectral imaging under the sample-imbalanced condition. Comput. Electron. Agric. 2022, 196, 106850. [Google Scholar] [CrossRef]
- Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-Supervised Representation Learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
- Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
- Liu, Y.; Zhou, S.; Wu, H.; Han, W.; Li, C.; Chen, H. Joint optimization of autoencoder and Self-Supervised Classifier: Anomaly detection of strawberries using hyperspectral imaging. Comput. Electron. Agric. 2022, 198, 107007. [Google Scholar] [CrossRef]
- Yang, Y.; Xu, Z. Rethinking the Value of Labels for Improving Class-Imbalanced Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 6–12 December 2020. [Google Scholar]
- Kotar, K.; Ilharco, G.; Schmidt, L.; Ehsani, K.; Mottaghi, R. Contrasting Contrastive Self-Supervised Representation Learning Pipelines. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Los Alamitos, CA, USA, 11–17 October 2021; pp. 9929–9939. [Google Scholar] [CrossRef]
- Liu, H.; HaoChen, J.Z.; Gaidon, A.; Ma, T. Self-supervised learning is more robust to dataset imbalance. arXiv 2021, arXiv:2110.05025. [Google Scholar]
- Seki, H.; Ma, T.; Murakami, H.; Tsuchikawa, S.; Inagaki, T. Visualization of Sugar Content Distribution of White Strawberry by Near-Infrared Hyperspectral Imaging. Foods 2023, 12, 931. [Google Scholar] [CrossRef]
- Gao, P.; Xu, W.; Yan, T.; Zhang, C.; Lv, X.; He, Y. Application of Near-Infrared Hyperspectral Imaging with Machine Learning Methods to Identify Geographical Origins of Dry Narrow-Leaved Oleaster (Elaeagnus angustifolia) Fruits. Foods 2019, 8, 620. [Google Scholar] [CrossRef]
- Kaushik, A.; Susan, S. Metric Learning with Deep Features for Highly Imbalanced Face Dataset. In Proceedings of the International Conference on Innovative Computing and Communications, Delhi, India, 19–20 February 2022; Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A., Eds.; Springer: Cham, Switzerland, 2022; pp. 639–646. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1199–1208. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice Loss for Data-imbalanced NLP Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 465–476. [Google Scholar] [CrossRef]
- Galdran, A.; Carneiro, G.; Ballester, M.A.G. On the Optimal Combination of Cross-Entropy and Soft Dice Losses for Lesion Segmentation with Out-of-Distribution Robustness. In Diabetic Foot Ulcers Grand Challenge; Yap, M.H., Kendrick, C., Cassidy, B., Eds.; Springer: Cham, Switzerland, 2023; pp. 40–51. [Google Scholar]
- Yin, J.; Tang, M.; Cao, J.; Wang, H.; You, M.; Lin, Y. Vulnerability Exploitation Time Prediction: An Integrated Framework for Dynamic Imbalanced Learning. World Wide Web 2022, 25, 401–423. [Google Scholar] [CrossRef]
- Sadhukhan, P.; Palit, S. Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets. Pattern Recognit. Lett. 2019, 125, 813–820. [Google Scholar] [CrossRef]
- Wu, T.; Huang, Q.; Liu, Z.; Wang, Y.; Lin, D. Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 162–178. [Google Scholar]
Layer | Input Dimension | Output Dimension | Activation Function |
---|---|---|---|
Batchnorm Layer 1 | 192 | 192 | N/A |
Linear Layer 1 | 256 | 256 | LeakyReLU |
Batchnorm Layer 2 | 256 | 256 | N/A |
Linear Layer 2 | 256 | 256 | LeakyReLU |
Batchnorm Layer 3 | 256 | 256 | N/A |
Linear Layer 3 | 256 | 256 | LeakyReLU |
Batchnorm Layer 4 | 256 | 256 | N/A |
Linear Layer 4 | 256 | 256 | LeakyReLU |
Set Pooling Layer | N/A | N/A | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pang, K.; Liu, Y.; Zhou, S.; Liao, Y.; Yin, Z.; Zhao, L.; Chen, H. Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data. Foods 2024, 13, 3598. https://doi.org/10.3390/foods13223598
Pang K, Liu Y, Zhou S, Liao Y, Yin Z, Zhao L, Chen H. Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data. Foods. 2024; 13(22):3598. https://doi.org/10.3390/foods13223598
Chicago/Turabian StylePang, Kunkun, Yisen Liu, Songbin Zhou, Yixiao Liao, Zexuan Yin, Lulu Zhao, and Hong Chen. 2024. "Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data" Foods 13, no. 22: 3598. https://doi.org/10.3390/foods13223598
APA StylePang, K., Liu, Y., Zhou, S., Liao, Y., Yin, Z., Zhao, L., & Chen, H. (2024). Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data. Foods, 13(22), 3598. https://doi.org/10.3390/foods13223598