Anomaly Score-Based Risk Early Warning System for Rapidly Controlling Food Safety Risk
Abstract
:1. Introduction
- 1.
- We propose an end-to-end unsupervised risk early warning model, which greatly improves the warning efficiency (running time) and is more realistic. Our work integrates neural network modeling into food distribution according to the principles of the hazard analysis critical control point (HACCP) system to find the key control points for risk warning and thus control the risk by conducting a comprehensive hazard analysis of each testing index.
- 2.
- Anomaly detection models are introduced for food safety risk early warning, which for the first time solve the food quality and safety warning problem from the idea of anomaly detection, quickly and efficiently solve the problem of unbalanced data samples, and provide a new possibility for food risk analysis.
- 3.
- Our proposed early warning model was verified by milk product safety detection data from a Chinese province, and extensive experiments have verified the validity of the proposed method. Noteworthy, we have mainly considered the current Chinese standard GB 25190-2010 (National Standard for Food Safety Sterilized Milk).
2. Related Work
2.1. Food Quality and Safety Risk Analysis Model Based on Machine Learning
2.2. Application of Anomaly Detection
3. Materials and Methods
3.1. Problem Statement
3.2. ASRWS: Anomaly Score-Based Risk Early Warning System
3.2.1. Data Preprocessing
3.2.2. Feature Extraction
Vanilla Auto-Encoder
Denoising Auto-Encoder
4. Experiments and Analysis of Results
4.1. Evaluation Index
4.2. Baseline Models
4.2.1. K-Nearest Neighbor (KNN)
4.2.2. Local Outlier Factor (LOF)
4.2.3. Connectivity-Based Outlier Factor (COF)
4.2.4. Isolation Forest (iForest)
4.2.5. Single-Objective Generative Adversarial Active Learning (SO-GAAL)
4.2.6. K-Means
4.3. Results Analysis
4.3.1. Main Results Analysis
- (1)
- The AUC and Acc values of all anomaly detection models were high, which proved that the anomaly detection algorithm could correctly predict the majority of samples. The experimental results show that the anomaly detection algorithms have good application scenarios in food safety risk analysis.
- (2)
- The best detection results were achieved for the AE performance, except for the time spent, which was inferior to the KNN model. In particular, for the FDR metric, the AE value of 0.9024 is significantly higher than the best baseline performance of 0.8048 by 0.0976. The main reason is the ability to capture the hidden representation between the detection values of each sample, thus allowing the screening of risky samples clustered within the safe samples.
- (3)
- In the baseline model, compared with the distance-based KNN, LOF, and COF, the ensemble-based iForest cannot achieve appreciable results, probably because certain the food risk samples are risk-free in most of the indicators, which makes it difficult to isolate their positions in the high-dimensional space with normal samples clustered.
- (4)
- AE achieved great success on the FAR metric relative to other models, which is a significant improvement of 0.189 over the second highest KNN model of 0.3779 and an improvement of more than 100%. This finding indicates that the AE is effective in preventing risk-free samples from being incorrectly predicted as risky samples.
- (5)
- The anomaly detection model SO-GAAL based on generative adversarial networks has the worst performance for each metric, one possible reason being that the dairy data has standard constraints for each detection metric resulting in poor quality of the pseudo data generated by the generator. From a time perspective, the clustering-based K-means takes less time, second only to KNN and AE.
4.3.2. Experimental Comparison Analysis
4.3.3. Visualization
- 0.
- : indicates safe and no obvious food safety risk. The qualified product risk score is lower than the unqualified product lowest score .
- 1.
- : indicates low risk, there is a food safety risk, but it is not apparent. The qualified product risk score is higher than the total number of products in the unqualified product sample but lower than the unqualified product lowest score .
- 2.
- : indicates medium risk, with certain food safety risks. The qualified product risk score is higher than the total number of products in the unqualified product sample .
- 3.
- denotes high food safety risk. The unqualified product belongs to the set of all unqualified products .
4.4. Effectiveness Analysis
4.5. Response Measures
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tang, J.; Chen, Z.; Fu, A.W.C.; Cheung, D.W. Enhancing effectiveness of Outlier detections for low Density Patterns. In Proceedings of the 6th Pacific-Asia Conference, PAKDD 2002, Taipei, Taiwan, 6–8 May 2002; Volume 2336, pp. 535–548. [Google Scholar] [CrossRef]
- Wu, Y.N.; Liu, P.; Chen, J.S. Food safety risk assessment in China: Past, present and future. Food Control 2018, 90, 212–221. [Google Scholar] [CrossRef]
- Tang, X.C. Construction of National Food Safety Risk Monitoring, Assessment and Early Warning System and Related Problems. Food Sci. 2013, 34, 342–348. [Google Scholar]
- Godefroy, S.B.; Al Arfaj, R.A.; Tabarani, A.; Mansour, H. Investments in Food Safety Risk Assessment and Risk Analysis as a Requirement for Robust Food Control Systems: Calling for Regional Centres of Expertise. Food Drug Regul. Sci. J. 2019, 2, 1. [Google Scholar] [CrossRef]
- Han, Y.; Cui, S.; Geng, Z.; Chu, C.; Chen, K.; Wang, Y. Food quality and safety risk assessment using a novel HMM method based on GRA. Food Control 2019, 105, 180–189. [Google Scholar] [CrossRef]
- Lin, X.; Cui, S.; Han, Y.; Geng, Z.; Zhong, Y. An improved ISM method based on GRA for hierarchical analyzing the influencing factors of food safety. Food Control 2019, 99, 48–56. [Google Scholar] [CrossRef]
- Bouzembrak, Y.; Marvin, H.J. Impact of drivers of change, including climatic factors, on the occurrence of chemical food safety hazards in fruits and vegetables: A Bayesian Network approach. Food Control 2019, 97, 67–76. [Google Scholar] [CrossRef]
- Bouzembrak, Y.; Marvin, H.J. Prediction of food fraud type using data from Rapid Alert System for Food and Feed (RASFF) and Bayesian network modelling. Food Control 2016, 61, 180–187. [Google Scholar] [CrossRef]
- Liu, Z.; Meng, L.; Zhao, W.; Yu, F. Application of ANN in food safety early warning. In Proceedings of the 2010 2nd International Conference on Future Computer and Communication, ICFCC 2010, Wuhan, China, 21–24 May 2010; Volume 3. [Google Scholar] [CrossRef]
- Zhang, D.; Xu, J.; Xu, J.; Li, C. Model for food safety warning based on inspection data and BP neural network. Nongye Gongcheng Xuebao Trans. Chin. Soc. Agric. Eng. 2010, 26, 221–226. [Google Scholar] [CrossRef]
- Samuel, O.W.; Asogbon, G.M.; Sangaiah, A.K.; Fang, P.; Li, G. An integrated decision support system based on ANN and Fuzzy AHP for heart failure risk prediction. Expert Syst. Appl. 2017, 68, 163–172. [Google Scholar] [CrossRef]
- Oladunjoye, A.O.; Oyewole, S.A.; Singh, S.; Ijabadeniyi, O.A. Prediction of Listeria monocytogenes ATCC 7644 growth on fresh-cut produce treated with bacteriophage and sucrose monolaurate by using artificial neural network. LWT Food Sci. Technol. 2017, 76, 9–17. [Google Scholar] [CrossRef]
- Geng, Z.; Shang, D.; Han, Y.; Zhong, Y. Early warning modeling and analysis based on a deep radial basis function neural network integrating an analytic hierarchy process: A case study for food safety. Food Control 2019, 96, 329–342. [Google Scholar] [CrossRef]
- Lin, X.; Li, J.; Han, Y.; Geng, Z.; Cui, S.; Chu, C. Dynamic risk assessment of food safety based on an improved hidden Markov model integrating cuckoo search algorithm: A sterilized milk study. J. Food Process. Eng. 2021, 44, e13630. [Google Scholar] [CrossRef]
- Niu, B.; Zhang, H.; Zhou, G.; Zhang, S.; Yang, Y.; Deng, X.; Chen, Q. Safety risk assessment and early warning of chemical contamination in vegetable oil. Food Control 2021, 125, 107970. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. Acm Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Allen, M. Outlier Analysis. In The SAGE Encyclopedia of Communication Research Methods; SAGE: Newcastle upon Tyne, UK, 2017. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising auto-encoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar] [CrossRef] [Green Version]
- Farmani, R.; Henriksen, H.J.; Savic, D. An evolutionary Bayesian belief network methodology for optimum management of groundwater contamination. Environ. Model. Softw. 2009, 24, 303–310. [Google Scholar] [CrossRef]
- Gavai, G.; Sricharan, K.; Gunning, D.; Hanley, J.; Singhal, M.; Rolleston, R. Supervised and unsupervised methods to detect insider threat from enterprise social and online activity data. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2015, 6, 47–63. [Google Scholar]
- Geng, Z.Q.; Zhao, S.S.; Tao, G.C.; Han, Y.M. Early warning modeling and analysis based on analytic hierarchy process integrated extreme learning machine (AHP-ELM): Application to food safety. Food Control 2017, 78, 33–42. [Google Scholar] [CrossRef]
- Huang, G.-B.; Ding, X.; Zhou, H. Optimization method based extreme learning machine for classification. Neurocomputing 2010, 74, 155–163. [Google Scholar] [CrossRef]
- Zuo, E.; Aysa, A.; Muhammat, M.; Zhao, Y.; Chen, B.; Ubul, K. A food safety prescreening method with domain-specific information using online reviews. J. Consum. Prot. Food Saf. 2022, 17, 163–175. [Google Scholar] [CrossRef]
- Geng, Z.; Liu, F.; Shang, D.; Han, Y.; Shang, Y.; Chu, C. Early warning and control of food safety risk using an improved AHC-RBF neural network integrating AHP-EW. J. Food Eng. 2021, 292, 110239. [Google Scholar] [CrossRef]
- Geng, Z.; Liang, L.; Han, Y.; Tao, G.; Chu, C. Risk early warning of food safety using novel long short-term memory neural network integrating sum product based analytic hierarchy process. Br. Food J. 2022, 124, 898–914. [Google Scholar] [CrossRef]
- Wang, Z.; Wu, Z.; Zou, M.; Wen, X.; Wang, Z.; Li, Y.; Zhang, Q. A Voting-Based Ensemble Deep Learning Method Focused on Multi-Step Prediction of Food Safety Risk Levels: Applications in Hazard Analysis of Heavy Metals in Grain Processing Products. Foods 2022, 11, 823. [Google Scholar] [CrossRef] [PubMed]
- Chalapathy, R.; Chawla, S. Deep Learning for Anomaly Detection: A Survey. arXiv 2019, arXiv:1901.03407v2. [Google Scholar]
- Adewumi, A.O.; Akinyelu, A.A. A survey of machine-learning and nature-inspired based credit card fraud detection techniques. Int. J. Syst. Assur. Eng. Manag. 2017, 8, 937–953. [Google Scholar] [CrossRef]
- Kwon, D.; Kim, H.; Kim, J.; Suh, S.C.; Kim, I.; Kim, K.J. A survey of deep learning-based network anomaly detection. Clust. Comput. 2019, 22, 949–961. [Google Scholar] [CrossRef]
- Carter, K.M.; Streilein, W.W. Probabilistic reasoning for streaming anomaly detection. In Proceedings of the 2012 IEEE Statistical Signal Processing Workshop, SSP 2012, Ann Arbor, MI, USA, 5–8 August 2012; pp. 377–380. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [Green Version]
- Mohammadi, M.; Al-Fuqaha, A.; Sorour, S.; Guizani, M. Deep learning for IoT big data and streaming analytics: A survey. IEEE Commun. Surv. Tutor. 2018, 20, 2923–2960. [Google Scholar] [CrossRef] [Green Version]
- Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote. Sens. 2017, 11, 042609. [Google Scholar] [CrossRef] [Green Version]
- Kiran, B.R.; Thomas, D.M.; Parakkal, R. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 2018, 4, 36. [Google Scholar] [CrossRef] [Green Version]
- Chalapathy, R.; Menon, A.K.; Chawla, S. Anomaly Detection using One-Class Neural Networks. arXiv 2018, arXiv:1802.06360. [Google Scholar]
- Veeramachaneni, K.; Arnaldo, I.; Korrapati, V.; Bassias, C.; Li, K. AI^2: Training a big data machine to defend. In Proceedings of the 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), New York, NY, USA, 9–10 April 2016; pp. 49–54. [Google Scholar]
- Hawkins, S.; He, H.; Williams, G.; Baxter, R. Outlier Detection Using Replicator Neural Networks BT-Data Warehousing and Knowledge Discovery; Springer: Berlin/Heidelberg, Germany, 2002; pp. 170–180. [Google Scholar]
- Zhang, Y. Food safety risk intelligence early warning based on support vector machine. J. Intell. Fuzzy Syst. 2020, 38, 6957–6969. [Google Scholar] [CrossRef]
- Ye, Z. On the Selection of the Methods of Index Forward and Dimensionless in Multi-Index Comprehensive Evaluation. Zhejiang Stat. 2003, Volume 4, 25–26. [Google Scholar]
- Mason, S.J.; Graham, N.E. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q. J. R. Meteorol. Soc. 2002, 128, 2145–2166. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef] [Green Version]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 93–104. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation-based anomaly detection. Acm Trans. Knowl. Discov. Data 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Z.; Zhou, C.; Jiang, Y.; Sun, J.; Wang, M.; He, X. Generative Adversarial Active Learning for Unsupervised Outlier Detection. IEEE Trans. Knowl. Data Eng. 2020, 32, 1517–1528. [Google Scholar] [CrossRef] [Green Version]
- Kulczycki, P.; Franus, K. Methodically unified procedures for a conditional approach to outlier detection, clustering, and classification. Inf. Sci. 2021, 560, 504–527. [Google Scholar] [CrossRef]
Categories | Requirements | Inspection Standard | |
---|---|---|---|
Protein (g/100 g) | ≥3.1 | GB 5009.5-2010 | |
Fat (g/100 g) | ≥3.7 | GB 5413.3-2010 | |
NMS (g/100 g) | ≥8.5 | GB 5413.39-2010 | |
Lactose (g/100 g) | ≤2.0 | GB 5009.8-2016 | |
AM1 (g/kg) | ≤0.5 | GB 2761-2017 | |
Acidity (°T) | 11∼16 | GB 5413.34-2010 |
Sample ID | Date of Inspection | Inspection Item Name | |||||
---|---|---|---|---|---|---|---|
Lactose | Acidity | NMS | Fat | Protein | AM1 | ||
20210913-761 | 13 September 2021 | 1.74 | 12 | 8.79 | 4.16 | 3.42 | 0.2 |
20180528-1284 | 28 May 2018 | 1.79 | 12.01 | 8.96 | 4.17 | 3.36 | 0.5 |
20210812-719 | 12 April 2021 | 1.73 | 12.2 | 8.8 | 4.1 | 3.42 | 0.2 |
20200409-469 | 9 April 2020 | 1.73 | 12.13 | 8.61 | 4.37 | 3.34 | 0.5 |
Models | FDR | FAR | AUC | Acc | Time/(s) |
---|---|---|---|---|---|
KNN | 0.8048 | 0.3779 | 0.9951 | 0.9925 | 0.11 |
LOF | 0.7073 | 0.5668 | 0.9959 | 0.9889 | 9.33 |
COF | 0.7317 | 0.5196 | 0.9956 | 0.9898 | 48.78 |
iForest | 0.6829 | 0.6141 | 0.9931 | 0.9879 | 17.22 |
SO-GAAL | 0.6097 | 0.7557 | 0.9879 | 0.9851 | 1.43 |
K-means | 0.7073 | 0.4723 | 0.9947 | 0.9887 | 0.62 |
AE | 0.9024 | 0.1889 | 0.9963 | 0.9954 | 0.58 |
T-Test Sets | {0,3} | {1,3} | {2,3} | {0,1} | {1,2} |
---|---|---|---|---|---|
p-value | 0.0381 | 0.0397 | 0.0401 | 1.3497 | 1.0639 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zuo, E.; Du, X.; Aysa, A.; Lv, X.; Muhammat, M.; Zhao, Y.; Ubul, K. Anomaly Score-Based Risk Early Warning System for Rapidly Controlling Food Safety Risk. Foods 2022, 11, 2076. https://doi.org/10.3390/foods11142076
Zuo E, Du X, Aysa A, Lv X, Muhammat M, Zhao Y, Ubul K. Anomaly Score-Based Risk Early Warning System for Rapidly Controlling Food Safety Risk. Foods. 2022; 11(14):2076. https://doi.org/10.3390/foods11142076
Chicago/Turabian StyleZuo, Enguang, Xusheng Du, Alimjan Aysa, Xiaoyi Lv, Mahpirat Muhammat, Yuxia Zhao, and Kurban Ubul. 2022. "Anomaly Score-Based Risk Early Warning System for Rapidly Controlling Food Safety Risk" Foods 11, no. 14: 2076. https://doi.org/10.3390/foods11142076
APA StyleZuo, E., Du, X., Aysa, A., Lv, X., Muhammat, M., Zhao, Y., & Ubul, K. (2022). Anomaly Score-Based Risk Early Warning System for Rapidly Controlling Food Safety Risk. Foods, 11(14), 2076. https://doi.org/10.3390/foods11142076