Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider
Abstract
:1. Introduction
2. Materials and Methods
2.1. Field Site Selection and Surveys
2.2. Kleptoparasite Foraging Group Size and Occurrence
2.3. Data Imputation
2.4. Statistical Significance of Features Related to Group Size
2.4.1. Partial Least-Squares Path Modelling (PLS-PM) of Population Size Features
2.4.2. IFD Test: Group Size and Web Area
3. Results
3.1. Comparison of Data Imputation Strategies
3.2. Biological Significance of Features Related to Group Size
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Begon, M.; Harper, J.L.; Townsend, C.R. Ecology. Individuals, Populations and Communities; Blackwell Scientific Publications: Oxford, UK, 1986. [Google Scholar]
- Giraldeau, L.-A.; Caraco, T. Social Foraging Theory; Princeton University Press: Princeton, NJ, USA, 2018; Volume 73. [Google Scholar]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Penone, C.; Davidson, A.D.; Shoemaker, K.T.; Di Marco, M.; Rondinini, C.; Brooks, T.M.; Young, B.E.; Graham, C.H.; Costa, G.C. Imputation of missing data in life-history trait datasets: Which approach performs the best? Methods Ecol. Evol. 2014, 5, 961–970. [Google Scholar] [CrossRef]
- Yi, Y.; Kim, Y.; Hikmat, A.; Choe, J.C. Information transfer through food from parents to offspring in wild Javan gibbons. Sci. Rep. 2020, 10, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Roth, T.S.; Rianti, P.; Fredriksson, G.M.; Wich, S.A.; Nowak, M.G. Grouping behavior of Sumatran orangutans (Pongo abelii) and Tapanuli orangutans (Pongo tapanuliensis) living in forest with low fruit abundance. Am. J. Primatol. 2020, 82, e23123. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Steinegger, M.; Sarhan, H.; Bshary, R. Laboratory experiments reveal effects of group size on hunting performance in yellow saddle goatfish, Parupeneus cyclostomus. Anim. Behav. 2020, 168, 159–167. [Google Scholar] [CrossRef]
- Teunissen, N.; Kingma, S.A.; Peters, A. Nest defence and offspring provisioning in a cooperative bird: Individual subordinates vary in total contribution, but no division of tasks among breeders and subordinates. Behav. Ecol. Sociobiol. 2020, 74, 1–9. [Google Scholar] [CrossRef]
- Su, Y.-C.; Peng, P.; Elgar, M.A.; Smith, D.R. Dual pathways in social evolution: Population genetic structure of group-living and solitary species of kleptoparasitic spiders (Argyrodinae: Theridiidae). PLoS ONE 2018, 13, e0208123. [Google Scholar] [CrossRef] [Green Version]
- Whitehouse, M. Kleptoparasitic Spiders of the Subfamily Argyrodinae: A Special Case of Behavioural Plasticity; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Agnarsson, I. Habitat patch size and isolation as predictors of occupancy and number of argyrodine spider kleptoparasites in Nephila webs. Naturwissenschaften 2011, 98, 163–167. [Google Scholar] [CrossRef]
- Cardoso, J.C.F.; Gonzaga, M.O. Spiders follow an ideal free distribution based on traits of the plant community. Ecol. Entomol. 2020. Early view. Available online: https://onlinelibrary.wiley.com/doi/10.1111/een.12951 (accessed on 1 December 2020). [CrossRef]
- Agnarsson, I. Spider webs as habitat patches—the distribution of kleptoparasites (Argyrodes, Theridiidae) among host webs (Nephila, Tetragnathidae). J. Arachnol. 2003, 31, 344–349. [Google Scholar] [CrossRef]
- Pigott, T.D. A review of methods for missing data. Educ. Res. Eval. 2001, 7, 353–383. [Google Scholar] [CrossRef] [Green Version]
- Engels, J.M.; Diehr, P. Imputation of missing longitudinal data: A comparison of methods. J. Clin. Epidemiol. 2003, 56, 968–976. [Google Scholar] [CrossRef]
- Soleymani, F.; Masnavi, H.; Shateyi, S. Classifying a lending portfolio of loans with dynamic updates via a machine learning Technique. Mathematics 2021, 9, 17. [Google Scholar] [CrossRef]
- Jukic, S.; Saracevic, M.; Subasi, A.; Kevric, J. Comparison of ensemble machine learning methods for automated classification of focal and non-focal epileptic EEG signals. Mathematics 2020, 8, 1481. [Google Scholar] [CrossRef]
- Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P.; Filip, F.; Band, S.S.; Reuter, U.; Gama, J.; Gandomi, A.H. Data science in economics: Comprehensive review of advanced machine learning and deep learning methods. Mathematics 2020, 8, 1799. [Google Scholar] [CrossRef]
- Chen, J.-B.; Lee, W.-C.; Cheng, B.-C.; Moi, S.-H.; Yang, C.-H.; Lin, Y.-D. Impact of risk factors on functional status in maintenance hemodialysis patients. Eur. J. Med Res. 2017, 22, 1–8. [Google Scholar] [CrossRef]
- Çatak, F.Ö. Classification with boosting of extreme learning machine over arbitrarily partitioned data. Soft Comput. 2017, 21, 2269–2281. [Google Scholar]
- Raja, P.; Thangavel, K. Missing value imputation using unsupervised machine learning techniques. Soft Comput. 2020, 24, 4361–4392. [Google Scholar] [CrossRef]
- Rafsunjani, S.; Safa, R.S.; Al Imran, A.; Rahim, M.S.; Nandi, D. An empirical comparison of missing value imputation techniques on APS failure prediction. I. J. Inf. Technol. Comput. Sci. 2019, 2, 21–29. [Google Scholar] [CrossRef] [Green Version]
- Wei, R.; Wang, J.; Su, M.; Jia, E.; Chen, S.; Chen, T.; Ni, Y. Missing value imputation approach for mass spectrometry-based metabolomics data. Sci. Rep. 2018, 8, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Biessmann, F.; Salinas, D.; Schelter, S.; Schmidt, P.; Lange, D. “Deep” Learning for missing value imputationin tables with non-numerical data. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 2017–2025. [Google Scholar]
- Tsai, C.-F.; Chang, F.-Y. Combining instance selection for better missing value imputation. J. Syst. Softw. 2016, 122, 63–71. [Google Scholar] [CrossRef]
- Li, L.; Zhang, J.; Wang, Y.; Ran, B. Missing value imputation for traffic-related time series data based on a multi-view learning method. IEEE Trans. Intell. Transp. Syst. 2018, 20, 2933–2943. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Schloerke, B.; Crowley, J.; Cook, D.; Briatte, F.; Marbach, M.; Thoen, E.; Elberg, A.; Larmarange, J. GGally: Extension to ‘ggplot2’(R Package Version 1.3. 1). Date 2016-11-13. [Electronic Resource]. Available online: https://cran.r-project.org/web/packages/GGally/index.html (accessed on 1 December 2020).
- Sanchez, G.; Trinchera, L.; Russolillo, G. plspm: Tools for Partial Least Squares Path Modeling (PLS-PM). R Package Version 0.4 2013, 1. Available online: https://cran.microsoft.com/snapshot/2014-11-23/web/packages/plspm/index.html (accessed on 1 December 2020).
- R_Core_Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria; Available online: https://www.R-project.org/ (accessed on 1 December 2020).
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [Green Version]
- Grinsted, L.; Deutsch, E.K.; Jimenez-Tenorio, M.; Lubin, Y. Evolutionary drivers of group foraging: A new framework for investigating variance in food intake and reproduction. Evolution 2019, 73, 2106–2121. [Google Scholar] [CrossRef]
- Fretwell, S.D. On territorial behavior and other factors influencing habitat distribution in birds. Acta Biotheor. 1969, 19, 45–52. [Google Scholar] [CrossRef]
- Tregenza, T. Common misconceptions in applying the ideal free distribution. Anim. Behav. 1994, 47, 485–487. [Google Scholar] [CrossRef] [Green Version]
- Křivan, V.; Cressman, R.; Schneider, C. The ideal free distribution: A review and synthesis of the game-theoretic perspective. Theor. Popul. Biol. 2008, 73, 403–425. [Google Scholar] [CrossRef]
- Krawczyk, B.; Minku, L.L.; Gama, J.; Stefanowski, J.; Woźniak, M. Ensemble learning for data stream analysis: A survey. Inf. Fusion 2017, 37, 132–156. [Google Scholar] [CrossRef] [Green Version]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef] [Green Version]
Algorithms | Parameters | Values |
---|---|---|
MEAN | No parameter | - |
ZERO | No parameter | - |
KNN | k | 5 |
CART | max_depth | None |
min_samples_split | 2 | |
min_samples_leaf | 1 | |
RF | max_depth | None |
min_samples_split | 2 | |
min_samples_leaf | 1 | |
Number of trees | 100 |
Method | Rank | Number | Mean Rank | Sum Rank | p-Value |
---|---|---|---|---|---|
RAW-ZERO | R+ | 51 | 34.25 | 1746.5 | <0.001 |
R− | 9 | 9.28 | 83.5 | ||
R= | 6 | ||||
RAW-MEAN | R+ | 34 | 29.19 | 992.5 | 0.030 |
R− | 20 | 24.63 | 492.5 | ||
R= | 12 | ||||
RAW-KNN | R+ | 32 | 27.8 | 889.5 | 0.014 |
R− | 18 | 21.42 | 385.5 | ||
R= | 16 | ||||
RAW-CART | R+ | 33 | 25.18 | 831 | 0.012 |
R− | 15 | 23 | 345 | ||
R= | 18 |
Klepto NUM | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Inner Model | RAW | ZERO | MEAN | KNN | CART | RF | Outer Model | RAW | ZERO | MEAN | KNN | CART | RF |
Resource size | Formative sup-features (weights) | ||||||||||||
Path coefficients | 0.494 | 0.485 | 0.499 | 0.483 | 0.479 | 0.500 | BLH | 0.562 | 0.195 | 0.304 | 0.338 | 0.293 | 0.319 |
Pr(>|t|) | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | WA | 0.527 | 0.886 | 0.788 | 0.775 | 0.804 | 0.775 |
Resource density | Reflective sup-features (loadings) | ||||||||||||
Path coefficients | −0.104 | −0.049 | −0.076 | −0.085 | −0.084 | −0.093 | Min D Klepto | 0.997 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Pr(>|t|) | 0.176 | 0.465 | 0.149 | 0.216 | 0.216 | 0.171 | Min D Host | 0.870 | 0.907 | 0.907 | 0.907 | 0.907 | 0.907 |
Microclimate | Formative sup-features (weights) | ||||||||||||
Path coefficients | 0.044 | 0.134 | 0.106 | 0.093 | 0.128 | 0.096 | Lux | 0.742 | 0.858 | 0.875 | 0.885 | 0.919 | 0.901 |
Pr(>|t|) | 0.642 | 0.062 | 0.257 | 0.221 | 0.080 | 0.196 | H% | 0.407 | 0.108 | 0.189 | 0.141 | 0.137 | 0.143 |
Temp | 0.162 | −0.060 | 0.094 | 0.083 | 0.070 | 0.069 | |||||||
IWS | 0.016 | −0.016 | −0.004 | 0.011 | 0.003 | 0.003 | |||||||
HWG | 0.529 | 0.361 | 0.322 | 0.305 | 0.254 | 0.278 | |||||||
Klepto Y/N | |||||||||||||
Inner model | RAW | ZERO | MEAN | KNN | CART | RF | Outer model | RAW | ZERO | MEAN | KNN | CART | RF |
Resource size | Formative sup-features (weights) | ||||||||||||
Path coefficients | 0.209 | 0.176 | 0.170 | 0.202 | 0.192 | 0.192 | BLH | 0.169 | −0.361 | 0.293 | 0.514 | 0.431 | 0.393 |
Pr(>|t|) | 0.013 | 0.019 | 0.024 | 0.007 | 0.011 | 0.010 | WA | 0.876 | 1.137 | 0.797 | 0.621 | 0.690 | 0.713 |
Resource density | Reflective sup-features (loadings) | ||||||||||||
Path coefficients | 0.069 | 0.132 | 0.128 | 0.135 | 0.137 | 0.139 | Min D Klepto | 0.842 | 0.119 | 0.119 | 0.119 | 0.119 | 0.119 |
Pr(>|t|) | 0.414 | 0.083 | 0.092 | 0.076 | 0.072 | 0.066 | Min D Host | 0.991 | 0.518 | 0.518 | 0.518 | 0.518 | 0.518 |
Microclimate | Formative sup-features (weights) | ||||||||||||
Path coefficients | 0.276 | 0.255 | 0.257 | 0.234 | 0.237 | 0.243 | Lux | 0.250 | 0.309 | 0.304 | 0.275 | 0.371 | 0.313 |
Pr(>|t|) | 0.002 | 0.001 | 0.001 | 0.002 | 0.002 | 0.001 | H% | 0.596 | 0.805 | 0.533 | 0.481 | 0.414 | 0.448 |
Temp | −0.322 | −0.671 | −0.416 | −0.454 | −0.509 | −0.512 | |||||||
IWS | 0.372 | 0.302 | 0.312 | 0.361 | 0.354 | 0.341 | |||||||
HWG | 0.741 | 0.684 | 0.738 | 0.759 | 0.718 | 0.720 |
Algorithms | Time Complexity | Space Complexity |
---|---|---|
MEAN | O(n) | O(n) |
ZERO | O(1) | O(1) |
KNN | O(nm) | O(nm) |
CART | O(m·nlogn) | O(p) |
RF | O(nlogn·mt) | O(pt) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, Y.-C.; Wu, C.-Y.; Yang, C.-H.; Li, B.-S.; Moi, S.-H.; Lin, Y.-D. Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider. Mathematics 2021, 9, 415. https://doi.org/10.3390/math9040415
Su Y-C, Wu C-Y, Yang C-H, Li B-S, Moi S-H, Lin Y-D. Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider. Mathematics. 2021; 9(4):415. https://doi.org/10.3390/math9040415
Chicago/Turabian StyleSu, Yong-Chao, Cheng-Yu Wu, Cheng-Hong Yang, Bo-Sheng Li, Sin-Hua Moi, and Yu-Da Lin. 2021. "Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider" Mathematics 9, no. 4: 415. https://doi.org/10.3390/math9040415
APA StyleSu, Y. -C., Wu, C. -Y., Yang, C. -H., Li, B. -S., Moi, S. -H., & Lin, Y. -D. (2021). Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider. Mathematics, 9(4), 415. https://doi.org/10.3390/math9040415