Automated Fluid Intake Detection Using RGB Videos
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Data Processing
2.3. Neural Network Configurations
- Window size for 3D: We extracted inputs of specific window sizes for the video data. We tested two different window sizes, 3 and 10 s.
- Frame rate for 3D: The frame rates were chosen as 6fps and 3fps to compromise computational cost and accuracy.
- Imbalanced data: In all cases, the data was imbalanced, as there were more non-drinking events. We compared two common methods to handle the imbalance: (1) class weights and (2) random under-sampling. Class weights involve applying a larger weight to the minority class (drinking class) so that it places more importance if these are misclassified during the training process. Under-sampling involves decreasing the majority class to be the same size as the minority class by randomly choosing inputs and discarding the rest. Oversampling, which involves randomly sampling and duplicating from the minority class until the classes are balanced, was also attempted.
- Layers trained: Either all the layers or only the top layer (known as feature extraction) were trained. After this, fine tuning was performed, which involves freezing the weights of all layers except the top layers (2/3rds) to refine the prediction.
- Validation method: A 10-fold cross-validation and leave-one-subject-out (LOSO) were both tested and compared.
- Pre-trained models: For the 2D models, 8 state-of-the-art pre-trained models were tested for the image data to determine the best one. These include DenseNet169, DenseNet121, InceptionResNetV2, InceptionV3, Xception, MobileNetV2, NasNetLarge, ResNet. Only one state-of-the-art model was considered for the video data, proposed by Carreira et al., called Inflated 3D Conv Nets (I3D), shown in Figure 3 [18]. This model is based on Inception-V1 but with inflated layers to add a temporal dimension. This was originally used with 71.1% accuracy to classify 400 human activities and outperformed other temporal classifiers on common benchmark datasets. This is a very deep, spatiotemporal classifier. This model was effective in other papers on video data [19,20,21]. This paper used the RGB-I3D network, which is pre-trained on the ImageNet dataset (https://www.image-net.org/ (accessed on 30 August 2022)) and the Kinetics dataset (https://paperswithcode.com/dataset/kinetics (accessed on 30 August 2022)). We added a dropout layer and an early stopping mechanism to prevent overfitting.
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bennett, J.A. Dehydration: Hazards and Benefits. Geriatr. Nur. 2000, 21, 84–88. [Google Scholar] [CrossRef] [PubMed]
- El-Sharkawy, A.M.; Sahota, O.; Maughan, R.J.; Lobo, D.N. The Pathophysiology of Fluid and Electrolyte Balance in the Older Adult Surgical Patient. Clin. Nutr. Edinb. Scotl. 2014, 33, 6–13. [Google Scholar] [CrossRef] [PubMed]
- Phillips, P.A.; Rolls, B.J.; Ledingham, J.G.; Forsling, M.L.; Morton, J.J.; Crowe, M.J.; Wollner, L. Reduced Thirst after Water Deprivation in Healthy Elderly Men. N. Engl. J. Med. 1984, 311, 753–759. [Google Scholar] [CrossRef] [PubMed]
- Birgersson, A.-M.B.; Hammar, V.; Widerfors, G.; Hallberg, I.R.; Athlin, E. Elderly Women’s Feelings about Being Urinary Incontinent, Using Napkins and Being Helped by Nurses to Change Napkins. J. Clin. Nurs. 1993, 2, 165–171. [Google Scholar] [CrossRef]
- Hooper, L.; Bunn, D.K.; Abdelhamid, A.; Gillings, R.; Jennings, A.; Maas, K.; Millar, S.; Twomlow, E.; Hunter, P.R.; Shepstone, L.; et al. Water-Loss (Intracellular) Dehydration Assessed Using Urinary Tests: How Well Do They Work? Diagnostic Accuracy in Older People. Am. J. Clin. Nutr. 2016, 104, 121–131. [Google Scholar] [CrossRef] [PubMed]
- Armstrong, L.E. Assessing Hydration Status: The Elusive Gold Standard. J. Am. Coll. Nutr. 2007, 26, 575S–584S. [Google Scholar] [CrossRef] [PubMed]
- Ferry, M. Strategies for Ensuring Good Hydration in the Elderly. Nutr. Rev. 2005, 63, S22–S29. [Google Scholar] [CrossRef] [PubMed]
- Cohen, R.; Fernie, G.; Roshan Fekr, A. Fluid Intake Monitoring Systems for the Elderly: A Review of the Literature. Nutrients 2021, 13, 2092. [Google Scholar] [CrossRef] [PubMed]
- Gemming, L.; Doherty, A.; Utter, J.; Shields, E.; Ni Mhurchu, C. The Use of a Wearable Camera to Capture and Categorise the Environmental and Social Context of Self-Identified Eating Episodes. Appetite 2015, 92, 118–125. [Google Scholar] [CrossRef]
- Davies, A.; Chan, V.; Bauman, A.; Signal, L.; Hosking, C.; Gemming, L.; Allman-Farinelli, M. Using Wearable Cameras to Monitor Eating and Drinking Behaviours during Transport Journeys. Eur. J. Nutr. 2020, 60, 1875–1885. [Google Scholar] [CrossRef] [PubMed]
- Doulah, A.B.M.S.U. A Wearable Sensor System for Automatic Food Intake Detection and Energy Intake Estimation in Humans. Ph.D. Thesis, University of Alabama Libraries, Tuscaloosa, AL, USA, 2018. [Google Scholar]
- Raju, V.; Sazonov, E. Processing of Egocentric Camera Images from a Wearable Food Intake Sensor. In Proceedings of the 2019 SoutheastCon, Huntsville, AL, USA, 11–14 April 2019; pp. 1–6. [Google Scholar]
- Rouast, P.V.; Adam, M.T.P. Learning Deep Representations for Video-Based Intake Gesture Detection. IEEE J. Biomed. Health Inform. 2020, 24, 1727–1737. [Google Scholar] [CrossRef] [PubMed]
- Heydarian, H.; Adam, M.T.P.; Burrows, T.; Rollo, M.E. Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection. IEEE Access 2021. [Google Scholar] [CrossRef]
- Iosifidis, A.; Marami, E.; Tefas, A.; Pitas, I. Eating and Drinking Activity Recognition Based on Discriminant Analysis of Fuzzy Distances and Activity Volumes. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 27 March 2012; pp. 2201–2204. [Google Scholar]
- Bi, S.; Kotz, D. Eating Detection with a Head-Mounted Video Camera. Comput. Sci. Tech. Rep. 2021. Available online: https://digitalcommons.dartmouth.edu/cs_tr/384/ (accessed on 28 August 2022).
- Chang, M.-J.; Hsieh, J.-T.; Fang, C.-Y.; Chen, S.-W. A Vision-Based Human Action Recognition System for Moving Cameras Through Deep Learning. In Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning; Association for Computing Machinery, New York, NY, USA, 27 November 2019; pp. 85–91. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4724–4733. [Google Scholar]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2556–2563. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2012, arXiv:12120402. [Google Scholar]
- Wu, K.; He, S.; Fernie, G.; Roshan Fekr, A. Deep Neural Network for Slip Detection on Ice Surface. Sensors 2020, 20, 6883. [Google Scholar] [CrossRef] [PubMed]
- Costa, L.; Trigueiros, P.; Cunha, A. Automatic Meal Intake Monitoring Using Hidden Markov Models. Procedia Comput. Sci. 2016, 100, 110–117. [Google Scholar] [CrossRef] [Green Version]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Drinking Container | ADLs Activities |
---|---|
A teacup with hot liquid A coffee mug with hot liquid Two metal commercial water bottle A plastic, disposable water bottle A can A glass tumbler with ice water A glass tumbler with colored liquid (pop or juice) A plastic, short-colored cup with water A wine glass with colored liquid (pop or juice) Two glass tumblers with a straw | Scratching their head and face Pointing the TV remote and watching TV Doing their hair/touching their head Using a laptop Using a smartphone (calling and texting) Pouring water from a kettle Stretching Washing the counters Putting on and taking off a jacket Walking around the apartment Talking to the researcher Writing Folding laundry Eating with a fork, spoon and hands 3x each |
Model | Validation | Model Description | Sampling | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
---|---|---|---|---|---|---|---|
3D | 10-FOLD | Window size: 3s, Sampling rate: 3fps, Batch size: 32, with feature extraction | Class weights | 97.1 (0.6) | 98.3 (1.2) | 90.7 (2.3) | 94.3 (1.2) |
Under-sample | 94.6 (2.6) | 85.8 (8.0) | 94.8 (1.3) | 89.8 (4.3) | |||
LOSO | Window size: 10s, Sampling rate: 3fps, Batch size: 16 | Class weights | 88.6 (9.4) | 88.6 (12.5) | 85.8 (20.7) | 84.2 (14.8) | |
Under-sample | 86.3 (7.7) | 86.5 (12.9) | 80.3 (18.1) | 80.8 (11.1) | |||
2D | 10-FOLD | DenseNet121, Batch size: 32 | Class weights | 98.7 (0.64) | 91.4 (5.4) | 96.2 (3.5) | 93.7 (3.3) |
Under-sample | 93.2 (4.4) | 81.4 (16.2) | 74 (24.4) | 75.4 (20.6) | |||
LOSO | Xception, Batch size: 16, with feature extraction | Class weights | 95.0 (1.9) | 86.3 (14.8) | 60.7 (19.7) | 68.2 (17.0) | |
Under-sample | 95.7 (1.03) | 83.2 (6.4) | 70.0 (20.7) | 73.6 (14.3) |
Ref. | Videos (3D) | Accuracy (%) | F1 Score (%) | Cross Validation | #Subjects | Camera Direction/Actions |
---|---|---|---|---|---|---|
Iosifidis [15] | Yes | 93% | - | LOSO | 4 | Frontal/mealtime only |
Rouast [13] | Yes | - | 85.8% | Holdout | 102 | 360-degree camera/mealtime only |
Proposed | Yes | 97.1% 88.6% | 93.7% 84.2% | 10-Fold LOSO | 9 | Multiple orientations in simulated home/eating, drinking, and ADLs |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cohen, R.; Fernie, G.; Roshan Fekr, A. Automated Fluid Intake Detection Using RGB Videos. Sensors 2022, 22, 6747. https://doi.org/10.3390/s22186747
Cohen R, Fernie G, Roshan Fekr A. Automated Fluid Intake Detection Using RGB Videos. Sensors. 2022; 22(18):6747. https://doi.org/10.3390/s22186747
Chicago/Turabian StyleCohen, Rachel, Geoff Fernie, and Atena Roshan Fekr. 2022. "Automated Fluid Intake Detection Using RGB Videos" Sensors 22, no. 18: 6747. https://doi.org/10.3390/s22186747
APA StyleCohen, R., Fernie, G., & Roshan Fekr, A. (2022). Automated Fluid Intake Detection Using RGB Videos. Sensors, 22(18), 6747. https://doi.org/10.3390/s22186747