Prediction of Complex Odor from Pig Barn Using Machine Learning and Identifying the Influence of Variables Using Explainable Artificial Intelligence
Abstract
:1. Introduction
2. Materials and Methods
2.1. Materials
2.1.1. Study Area
2.1.2. Data Sampling
2.2. Methods
2.2.1. Preprocessing
Multiple Imputation
Analysis of Variance and Correlation Analysis
2.2.2. Classification Model
K-Nearest Neighbor
Support Vector Machine
Random Forest
Extremely Randomized Trees
eXtreme Gradient Boosting
Light Gradient Boosting Machine
K-Fold Cross-Validation
2.2.3. Performance Evaluation Metrics
2.2.4. Explainable Artificial Intelligence
3. Results
3.1. Preprocessing
3.1.1. Missing Imputation
3.1.2. Correlation Analysis
3.1.3. Analysis of Variance
3.2. Predictive Classification Model
3.3. Explainable Artificial Intelligence
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wojnarowska, M.; Sagan, A.; Plichta, J.; Plichta, G.; Szakiel, J.; Turek, P.; Sołtysik, M. The influence of the methods of measuring odours nuisance on the quality of life. Environ. Impact Assess. Rev. 2021, 86, 106491. [Google Scholar] [CrossRef]
- Torkey, H.; Atlam, M.; El-Fishawy, N.; Salem, H. A novel deep autoencoder based survival analysis approach for microarray dataset. PeerJ Comput. Sci. 2021, 7, e492. [Google Scholar] [CrossRef] [PubMed]
- Hidayat, R.; Wang, Z.-H. Odor classification in cattle ranch based on electronic nose. Int. J. Data Sci. 2021, 2, 104–111. [Google Scholar]
- Yan, L.; Wu, C.; Liu, J. Visual analysis of odor interaction based on support vector regression method. Sensors 2020, 20, 1707. [Google Scholar] [CrossRef] [Green Version]
- Wojnarowska, M.; Ilba, M.; Szakiel, J.; Turek, P.; Sołtysik, M. Identifying the location of odour nuisance emitters using spatial GIS analyses. Chemosphere 2021, 263, 128252. [Google Scholar] [CrossRef]
- Rincón, C.A.; De Guardia, A.; Couvert, A.; Wolbert, D.; Le Roux, S.; Soutrel, I.; Nunes, G. Odor concentration (OC) prediction based on odor activity values (OAVs) during composting of solid wastes and digestates. Atmos. Environ. 2019, 201, 1–12. [Google Scholar] [CrossRef]
- Barczak, R.J.; Możaryn, J.; Fisher, R.M.; Stuetz, R.M. Odour concentrations prediction based on odorants concentrations from biosolid emissions. Environ. Res. 2022, 214, 113871. [Google Scholar] [CrossRef]
- Cangialosi, F.; Bruno, E.; De Santis, G. Application of Machine Learning for Fenceline Monitoring of Odor Classes and Concentrations at a Wastewater Treatment Plant. Sensors 2021, 21, 4716. [Google Scholar] [CrossRef]
- Kang, J.-H.; Song, J.; Yoo, S.S.; Lee, B.-J.; Ji, H.W. Prediction of odor concentration emitted from wastewater treatment plant using an artificial neural network (ANN). Atmosphere 2020, 11, 784. [Google Scholar] [CrossRef]
- Mulrow, J.; Kshetry, N.; Brose, D.A.; Kumar, K.; Jain, D.; Shah, M.; Kunetz, T.E.; Varshney, L.R. Prediction of odor complaints at a large composite reservoir in a highly urbanized area: A machine learning approach. Water Environ. Res. 2020, 92, 418–429. [Google Scholar] [CrossRef]
- Zhu, X.; Li, Y.; Wang, X. Machine learning prediction of biochar yield and carbon contents in biochar based on biomass characteristics and pyrolysis conditions. Bioresour. Technol. 2019, 288, 121527. [Google Scholar] [CrossRef] [PubMed]
- Qi, C.; Wu, M.; Zheng, J.; Chen, Q.; Chai, L. Rapid identification of reactivity for the efficient recycling of coal fly ash: Hybrid machine learning modeling and interpretation. J. Clean. Prod. 2022, 343, 130958. [Google Scholar] [CrossRef]
- Wojtuch, A.; Jankowski, R.; Podlewska, S. How can SHAP values help to shape metabolic stability of chemical compounds? J. Cheminform. 2021, 13, 74. [Google Scholar] [CrossRef] [PubMed]
- Chakkingal, A.; Janssens, P.; Poissonnier, J.; Barrios, A.J.; Virginie, M.; Khodakov, A.Y.; Thybaut, J.W. Machine learning based interpretation of microkinetic data: A Fischer–Tropsch synthesis case study. React. Chem. Eng. 2022, 7, 101–110. [Google Scholar] [CrossRef]
- Grimmig, R.; Lindner, S.; Gillemot, P.; Winkler, M.; Witzleben, S. Analyses of used engine oils via atomic spectroscopy–Influence of sample pre-treatment and machine learning for engine type classification and lifetime assessment. Talanta 2021, 232, 122431. [Google Scholar] [CrossRef]
- Blazy, V.; de Guardia, A.; Benoist, J.C.; Daumoin, M.; Guiziou, F.; Lemasle, M.; Wolbert, D.; Barrington, S. Correlation of chemical composition and odor concentration for emissions from pig slaughterhouse sludge composting and storage. Chem. Eng. J. 2015, 276, 398–409. [Google Scholar] [CrossRef]
- The Malodor Prevention Act Institution. The Malodor Prevention Act in Korea. Available online: https://easylaw.go.kr/CSP/CnpClsMainBtr.laf?popMenu=ov&csmSeq=1405&ccfNo=2&cciNo=2&cnpClsNo=1#copyAddress (accessed on 10 November 2022).
- Lee, D.-H.; Woo, S.-E.; Jung, M.-W.; Heo, T.-Y. Evaluation of Odor Prediction Model Performance and Variable Importance according to Various Missing Imputation Methods. Appl. Sci. 2022, 12, 2826. [Google Scholar] [CrossRef]
- Jang, Y.N.; Jung, M.W. Biochemical changes and biological origin of key odor compound generations in pig slurry during indoor storage periods: A pyrosequencing approach. BioMed Res. Int. 2018, 2018, 3503658. [Google Scholar] [CrossRef] [Green Version]
- Jensen, B.B.; Jørgensen, H. Effect of dietary fiber on microbial activity and microbial gas production in various regions of the gastrointestinal tract of pigs. Appl. Environ. Microbiol. 1994, 60, 1897–1904. [Google Scholar] [CrossRef] [Green Version]
- Jang, Y.-N.; Hwang, O.; Jung, M.-W.; Ahn, B.-K.; Kim, H.; Jo, G.; Yun, Y.-M. Comprehensive analysis of microbial dynamics linked with the reduction of odorous compounds in a full-scale swine manure pit recharge system with recirculation of aerobically treated liquid fertilizer. Sci. Total Environ. 2021, 777, 146122. [Google Scholar] [CrossRef]
- Allison, P.D. Multiple imputation for missing data: A cautionary tale. Sociol. Methods Res. 2000, 28, 301–309. [Google Scholar] [CrossRef]
- Gogtay, N.J.; Thatte, U.M. Principles of correlation analysis. J. Assoc. Physicians India 2017, 65, 78–81. [Google Scholar] [PubMed]
- Aldayel, M.S. K-Nearest Neighbor classification for glass identification problem. In Proceedings of the 2012 International Conference on Computer Systems and Industrial Informatics, Sharjah, United Arab Emirates, 18–20 December 2012; pp. 1–5. [Google Scholar]
- Salem, H.; Shams, M.Y.; Elzeki, O.M.; Abd Elfattah, M.; F. Al-Amri, J.; Elnazer, S. Fine-tuning fuzzy KNN classifier based on uncertainty membership for the medical diagnosis of diabetes. Appl. Sci. 2022, 12, 950. [Google Scholar] [CrossRef]
- Pradhan, A. Support vector machine-a survey. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 82–85. [Google Scholar]
- Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the IJCAI, Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1145. [Google Scholar]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for multi-class classification: An overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
- Robinson, C.; Schumacker, R.E. Interaction effects: Centering, variance inflation factor, and interpretation issues. Mult. Linear Regres. Viewp. 2009, 35, 6–11. [Google Scholar]
- Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef] [Green Version]
- Wei, P.; Lu, Z.; Song, J. Variable importance analysis: A comprehensive review. Reliab. Eng. Syst. Saf. 2015, 142, 399–432. [Google Scholar] [CrossRef]
Variable | Sampling Method | Analytical Instrument | Analytical Conditions |
---|---|---|---|
Response variable (Complex odor) | Lung Sampler and Polyester Aluminum bag (10 L) | Air dilution method, Korea | |
Explanatory variable (Ammonia) | Solution Absorption | UV/vis (Shimadzu) | Wavelength range 640 nm |
Explanatory variable (Four Sulfur Compounds) | Lung Sampler and Polyester Aluminum bag (10 L) | GC/PFPD (456-GC, Scion instruments) | Column: CP-Sil 5CB (60 m × 0.32 mm × 5 μm) Oven Condition: 60 °C (3 min) → (8 °C/min)→ 160 °C (9 min) |
Explanatory variable (Ten VOCs) | Tenax TA tube adsorption | GC/FID (CP-3800, Varian) | Column: DB-WAX (30 m × 0.25 mm × 0.25 μm) Oven Condition: 40 °C →(8 °C/min)→ 150 °C→(10 °C/min) → 230 °C |
Class | Variable Name (Abbreviation) | Unit | MDL * | Type | ||
---|---|---|---|---|---|---|
Response variable | Complex Odor | 0: discharge 1: no discharge | ||||
Explanatory variables | Ammonia (NH3) | ppm | 0.08 | Float64 | ||
Sulfur compounds | Hydrogen sulfide (H2S) | ppb | 0.06 | Float64 | ||
Methyl mercaptan (MM) | 0.07 | Float64 | ||||
Dimethyl sulfide (DMS) | 0.08 | Float64 | ||||
Dimethyl disulfide (DMDS) | 0.05 | Float64 | ||||
Volatile Organic Compounds (VOCs) | Acetic acid (ACA) | ppb | 0.07 | Float64 | ||
Propionic acid (PPA) | 0.34 | Float64 | ||||
Isobutyric acid (IBA) | 0.52 | Float64 | ||||
Normality butyric acid (BTA) | 0.96 | Float64 | ||||
Isovaleric acid (IVA) | 0.49 | Float64 | ||||
Normality valeric acid (VLA) | 0.53 | Float64 | ||||
Phenol (Ph) | 0.09 | Float64 | ||||
P-Cresol (p-C) | 0.06 | Float64 | ||||
Indole (ID) | 0.40 | Float64 | ||||
Skatole (SK) | 0.38 | Float64 | ||||
Location | In: inside of the pig barn Out: outside of the pig barn Boundary: site boundaries | |||||
Season | Spring: March–May Summer: June–August Fall: September–November Winter: December–February |
Confusion Matrix | True | ||
---|---|---|---|
Class 0 (Discharge) | Class 1 (No Discharge) | ||
Predict | Class 0 (discharge) | True Positive (TP) | False Positive (FP) |
Class 1 (no discharge) | False Negative (FN) | True Negative (TN) |
Complex Odor | Ammonia | Hydrogen Sulfide | Methyl Mercaptan | Dimethyl Sulfide | Dimethyl Disulfide | Acetic Acid | Propionic Acid | |
---|---|---|---|---|---|---|---|---|
Mean | 800 | 2.72 | 208.51 | 6.04 | 5.65 | 0.12 | 240.52 | 184.06 |
STD | 1467 | 3.99 | 386.96 | 15.53 | 40.51 | 0.44 | 407.90 | 291.47 |
Min | 3 | 0.00 | 0.04 | 0.06 | 0.00 | 0.02 | 0.15 | 0.13 |
Median | 300 | 1.06 | 56.25 | 0.07 | 0.08 | 0.05 | 27.81 | 17.70 |
Max | 10000 | 22.24 | 2484.00 | 120.00 | 462.00 | 4.28 | 2446.00 | 2109.69 |
Iso-Butryic Acid | Butyric Acid | Iso-Valeric acid | Valeric Acid | Phenol | p-Cresol | Indole | Skatole | |
Mean | 19.28 | 159.94 | 46.071 | 85.78 | 6.78 | 34.27 | 2.18 | 3.15 |
STD | 37.08 | 262.03 | 83.60 | 192.73 | 12.70 | 60.15 | 7.08 | 9.15 |
Min | 0.04 | 0.52 | 0.08 | 0.28 | 0.06 | 0.00 | 0.02 | 0.04 |
Median | 2.01 | 7.93 | 4.42 | 5.52 | 1.85 | 2.89 | 0.86 | 1.52 |
Max | 380.00 | 1455.52 | 743.69 | 1869.40 | 125.72 | 481.20 | 95.28 | 127.13 |
Variables | Correlation | VIF | Variables | Correlation | VIF |
---|---|---|---|---|---|
Ammoina | 0.50 | 2.47 | Butricy acid | 0.35 | 20.40 |
Hydorgen sulfide | 0.27 | 1.42 | Iso-valeric acid | 0.28 | 34.93 |
Methyl mercaptan | 0.39 | 1.53 | Valeric acid | 0.24 | 26.34 |
Dimethyl sulfide | 0.02 | 1.22 | Phenol | 0.29 | 6.52 |
Dimethyl disulfide | 0.11 | 1.48 | p-Cresol | 0.42 | 4.86 |
Acetic acid | 0.50 | 8.82 | Indole | 0.09 | 1.07 |
Propionic acid | 0.46 | 58.70 | Skatole | 0.07 | 1.10 |
Iso-butryic acid | 0.58 | 6.35 |
Variables | In | Out | Boundary | F-Value | p-Value | Variables | In | Out | Boundary | F-Value | p-Value |
---|---|---|---|---|---|---|---|---|---|---|---|
Complex Odor | 33.84 | <0.001 | Iso-butryic acid | 33.16 | <0.001 | ||||||
Ammoina | 38.52 | <0.001 | Butricy acid | 30.48 | <0.001 | ||||||
Hydorgen sulfide | 16.95 | <0.001 | Iso-valeric acid | 95.74 A | 23.63 | <0.001 | |||||
Methyl mercaptan | 13.90 | <0.001 | Valeric acid | 16.31 | <0.001 | ||||||
Dimethyl sulfide | 11.87 | 6.08 | 0.11 | 1.40 | 0.2501 | Phenol | 11.40 | <0.001 | |||
Dimethyl disulfide | 0.20 | 0.13 | 0.06 | 1.82 | 0.1639 | p-Cresol | 24.49 | <0.001 | |||
Acetic acid | 218.76 A | 25.48 | <0.001 | Indole | 2.45 | 1.82 | 2.36 | 0.16 | 0.8489 | ||
Propionic Acid | 33.91 | <0.001 | Skatole | 4.35 | 2.28 | 3.13 | 0.85 | 0.4309 |
Variables | Spring | Summer | Fall | F-Value | p-Value | Variables | Spring | Summer | Fall | F-Value | p-Value |
---|---|---|---|---|---|---|---|---|---|---|---|
Complex odor | 641 | 681 | 768 | 0.19 | 0.8291 | Iso-butryic acid | 7.43 | 0.0017 | |||
Ammonia | 1.44 | 2.60 | 3.20 | 2.15 | 0.1194 | Butricy Acid | 20.92 | <0.001 | |||
Hydrogen sulfide | 213.68 | 240.98 | 169.28 | 0.79 | 0.4552 | Iso-valeric acid | 15.61 | <0.001 | |||
Methyl mercaptan | 4.21 | 0.0162 | Valeric acid | 8.52 | 0.0012 | ||||||
Dimethyl sulfide | 10.67 | <0.001 | Phenol | 4.63 | 0.0108 | ||||||
Dimethyl disulfide | 0.08 | 0.17 | 0.08 | 1.06 | 0.3485 | p-Cresol | 51.85 | 28.25 | 34.15 | 1.68 | 0.1882 |
Acetic acid | 3.88 | 0.0221 | Indole | 1.28 | 1.25 | 3.42 | 2.43 | 0.0904 | |||
Propionic acid | 7.62 | 0.0016 | Skatole | 5.01 | 2.69 | 3.07 | 0.69 | 0.5039 |
Model | Parameter | Range | Optimal Parameter |
---|---|---|---|
KNN | n_neighbors | *Range (1, 15, 1) | 4 |
Wights | ‘uniform’, ‘distance’ | ‘distance’ | |
Algorithm | ‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’ | ‘auto’ | |
SVM | C | [0.05, 0.5, 1.0] | 0.5 |
Kernel | ‘linear’, ‘rbf’ | ‘linear’ | |
n_estimators | Range (10, 100, 5) | 10 | |
RF | max_depth | Range (1, 15, 1) | 10 |
min_samples_leaf | Range (1, 10, 1) | 2 | |
min_samples_split | [2,5,8,10,12] | 2 | |
Criterion | ‘gini’, ‘entropy’ | ‘entropy’ | |
Extra-Trees | n_estimators | Range (1, 3, 1) | 3 |
max_depth | Range (1, 7, 1) | 5 | |
num_leaves | [31,127] | 31 | |
LightGBM | max_depth | Range (3, 5, 1) | 3 |
min_child_weight | [, , , , , ] | ||
min_data_in_leaf | [30,50,100] | 30 | |
XGBoost | max_depth | Range (3, 10, 1) | 3 |
min_child_weight | Range (3, 5, 1) | 3 |
F1-Score | Accuracy | Sensitivity | Specificity | PPV | NPV | |
---|---|---|---|---|---|---|
KNN | 0.74 (0.05) | 0.74 (0.03) | 0.77 (0.09) | 0.72 (0.08) | 0.72 (0.08) | 0.77 (0.09) |
SVM | 0.77 (0.05) | 0.81 (0.02) | 0.74 (0.08) | 0.87 (0.04) | 0.82 (0.06) | 0.80 (0.04) |
RF | 0.76 (0.05) | 0.78 (0.02) | 0.74 (0.08) | 0.82 (0.06) | 0.79 (0.05) | 0.78 (0.04) |
Extra-Trees | 0.72 (0.05) | 0.76 (0.04) | 0.70 (0.11) | 0.80 (0.11) | 0.77 (0.10) | 0.76 (0.08) |
LightGBM | 0.76 (0.05) | 0.78 (0.03) | 0.75 (0.09) | 0.81 (0.07) | 0.77 (0.07) | 0.79 (0.07) |
XGBoost | 0.74 (0.05) | 0.77 (0.03) | 0.72 (0.10) | 0.81 (0.07) | 0.78 (0.06) | 0.76 (0.08) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, D.-H.; Lee, S.-H.; Woo, S.-E.; Jung, M.-W.; Kim, D.-y.; Heo, T.-Y. Prediction of Complex Odor from Pig Barn Using Machine Learning and Identifying the Influence of Variables Using Explainable Artificial Intelligence. Appl. Sci. 2022, 12, 12943. https://doi.org/10.3390/app122412943
Lee D-H, Lee S-H, Woo S-E, Jung M-W, Kim D-y, Heo T-Y. Prediction of Complex Odor from Pig Barn Using Machine Learning and Identifying the Influence of Variables Using Explainable Artificial Intelligence. Applied Sciences. 2022; 12(24):12943. https://doi.org/10.3390/app122412943
Chicago/Turabian StyleLee, Do-Hyun, Sang-Hun Lee, Saem-Ee Woo, Min-Woong Jung, Do-yun Kim, and Tae-Young Heo. 2022. "Prediction of Complex Odor from Pig Barn Using Machine Learning and Identifying the Influence of Variables Using Explainable Artificial Intelligence" Applied Sciences 12, no. 24: 12943. https://doi.org/10.3390/app122412943
APA StyleLee, D. -H., Lee, S. -H., Woo, S. -E., Jung, M. -W., Kim, D. -y., & Heo, T. -Y. (2022). Prediction of Complex Odor from Pig Barn Using Machine Learning and Identifying the Influence of Variables Using Explainable Artificial Intelligence. Applied Sciences, 12(24), 12943. https://doi.org/10.3390/app122412943