OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results
Abstract
:1. Introduction
2. OpenEDS 2020 Challenge: Description
2.1. Gaze Prediction Challenge
2.2. Sparse Temporal Semantic Segmentation Challenge
3. OpenEDS2020 Dataset
3.1. Data Collection
3.2. Gaze Prediction
3.2.1. Dataset Curation
3.2.2. Annotations
3.3. Semantic Eye Segmentation
3.3.1. Dataset Curation
3.3.2. Annotations
4. Baseline Methodologies
4.1. Gaze Prediction
4.2. Semantic Eye Segmentation
5. Challenge Participation and Winning Solutions
5.1. Gaze Prediction Challenge Winners
5.2. Sparse Semantic Segmentation Challenge Winners
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chita-Tegmark, M. Social attention in ASD: A review and meta-analysis of eye-tracking studies. Res. Dev. Disabil. 2016, 48, 79–93. [Google Scholar] [CrossRef] [PubMed]
- O’Driscoll, G.A.; Callahan, B.L. Smooth pursuit in schizophrenia: A meta-analytic review of research since 1993. Brain Cognit. 2008, 68, 359–370. [Google Scholar] [CrossRef]
- Pan, B.; Hembrooke, H.A.; Gay, G.K.; Granka, L.A.; Feusner, M.K.; Newman, J.K. The determinants of web page viewing behavior: An eye-tracking study. In Proceedings of the 2004 Symposium on Eye Tracking Research &Applications, San Antonio, TX, USA, 22–24 March 2004; pp. 147–154. [Google Scholar]
- Fan, L.; Wang, W.; Huang, S.; Tang, X.; Zhu, S.C. Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5724–5733. [Google Scholar]
- Fernandez, M. Augmented virtual reality: How to improve education systems. High. Learn. Res. Commun. 2017, 7, 1–15. [Google Scholar] [CrossRef]
- Izard, S.G.; Juanes, J.A.; Peñalvo, F.J.G.; Estella, J.M.G.; Ledesma, M.J.S.; Ruisoto, P. Virtual reality as an educational and training tool for medicine. J. Med. Syst. 2018, 42, 50. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Yu, F.; Shi, D.; Shi, J.; Tian, Z.; Yang, J.; Wang, X.; Jiang, Q. Application of virtual reality technology in clinical medicine. Am. J. Transl. Res. 2017, 9, 3867. [Google Scholar]
- Hartmann, T.; Fox, J. Entertainment in Virtual Reality and Beyond: The Influence of Embodiment, Co-Location, and Cognitive Distancing on Users’ Entertainment Experience. In The Oxford Handbook of Entertainment Theory; Oxford University Press: Oxford, UK, 2020. [Google Scholar]
- Pucihar, K.Č.; Coulton, P. Exploring the evolution of mobile augmented reality for future entertainment systems. Comput. Entertain. (CIE) 2015, 11, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Smith, H.J.; Neff, M. Communication behavior in embodied virtual reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 21–26 April 2018; pp. 1–12. [Google Scholar]
- Kim, S.; Lee, G.; Sakata, N.; Billinghurst, M. Improving co-presence with augmented visual communication cues for sharing experience through video conference. In Proceedings of the 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 10–12 September 2014; pp. 83–92. [Google Scholar]
- Thomas, B.H. A survey of visual, mixed, and augmented reality gaming. Comput. Entertain. (CIE) 2012, 10, 1–33. [Google Scholar] [CrossRef]
- Miller, K.J.; Adair, B.S.; Pearce, A.J.; Said, C.M.; Ozanne, E.; Morris, M.M. Effectiveness and feasibility of virtual reality and gaming system use at home by older adults for enabling physical activity to improve health-related domains: A systematic review. Age Ageing 2014, 43, 188–195. [Google Scholar] [CrossRef] [Green Version]
- Patney, A.; Salvi, M.; Kim, J.; Kaplanyan, A.; Wyman, C.; Benty, N.; Luebke, D.; Lefohn, A. Towards foveated rendering for gaze-tracked virtual reality. ACM Trans. Graph. (TOG) 2016, 35, 179. [Google Scholar] [CrossRef]
- Hansen, D.W.; Ji, Q. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 478–500. [Google Scholar] [CrossRef]
- Guestrin, E.D.; Eizenman, M. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans. Biomed. Eng. 2006, 53, 1124–1133. [Google Scholar] [CrossRef]
- Abdulin, E.; Friedman, L.; Komogortsev, O. Custom Video-Oculography Device and Its Application to Fourth Purkinje Image Detection during Saccades. arXiv 2019, arXiv:1904.07361. [Google Scholar]
- Wang, K.; Ji, Q. Real time eye gaze tracking with 3d deformable eye-face model. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1003–1011. [Google Scholar]
- Wood, E.; Baltrušaitis, T.; Morency, L.P.; Robinson, P.; Bulling, A. A 3d morphable eye region model for gaze estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany; pp. 297–313. [Google Scholar]
- Zhang, X.; Sugano, Y.; Fritz, M.; Bulling, A. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 162–175. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Park, S.; Spurr, A.; Hilliges, O. Deep pictorial gaze estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 721–738. [Google Scholar]
- Yiu, Y.H.; Aboulatta, M.; Raiser, T.; Ophey, L.; Flanagin, V.L.; zu Eulenburg, P.; Ahmadi, S.A. DeepVOG: Open-source pupil segmentation and gaze estimation in neuroscience using deep learning. J. Neurosci. Methods 2019, 324, 108307. [Google Scholar] [CrossRef] [PubMed]
- Palmero Cantarino, C.; Komogortsev, O.V.; Talathi, S.S. Benefits of temporal information for appearance-based gaze estimation. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, 2–5 June 2020; pp. 1–5. [Google Scholar]
- Palmero, C.; Selva, J.; Bagheri, M.A.; Escalera, S. Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Wang, K.; Su, H.; Ji, Q. Neuro-inspired eye tracking with eye movement dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9831–9840. [Google Scholar]
- Funes Mora, K.A.; Monay, F.; Odobez, J.M. Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In Proceedings of the Symposium on Eye Tracking Research and Applications, Safety Harbor, FL, USA, 26–28 March 2014; pp. 255–258. [Google Scholar]
- Park, S.; Aksan, E.; Zhang, X.; Hilliges, O. Towards End-to-end Video-based Eye-Tracking. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2020; Springer: Berlin/Heidelberg, Germany; pp. 747–763. [Google Scholar]
- Leigh, R.J.; Zee, D.S. The Neurology of Eye Movements; Oxford University Press: Oxford, UK, 2015. [Google Scholar] [CrossRef]
- McMurrough, C.D.; Metsis, V.; Rich, J.; Makedon, F. An eye tracking dataset for point of gaze detection. In Proceedings of the Symposium on Eye Tracking Research and Applications, Santa Barbara, CA, USA, 28–30 March 2012; pp. 305–308. [Google Scholar]
- Tonsen, M.; Zhang, X.; Sugano, Y.; Bulling, A. Labelled pupils in the wild: A dataset for studying pupil detection in unconstrained environments. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research &Applications, Charleston, SC, USA, 14–17 March 2016; pp. 139–142. [Google Scholar]
- Fuhl, W.; Geisler, D.; Rosenstiel, W.; Kasneci, E. The applicability of Cycle GANs for pupil and eyelid segmentation, data generation and image refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Kim, J.; Stengel, M.; Majercik, A.; De Mello, S.; Dunn, D.; Laine, S.; McGuire, M.; Luebke, D. Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–12. [Google Scholar]
- Kothari, R.; Yang, Z.; Kanan, C.; Bailey, R.; Pelz, J.B.; Diaz, G.J. Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities. Sci. Rep. 2020, 10, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Fuhl, W.; Kasneci, E. A Multimodal Eye Movement Dataset and a Multimodal Eye Movement Segmentation Analysis. arXiv 2021, arXiv:2101.04318. [Google Scholar]
- Fuhl, W.; Rosenstiel, W.; Kasneci, E. 500,000 images closer to eyelid and pupil segmentation. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Salerno, Italy, 3–5 September 2019; Springer: Berlin/Heidelberg, Germany; pp. 336–347. [Google Scholar]
- Fuhl, W.; Gao, H.; Kasneci, E. Neural networks for optical vector and eye ball parameter estimation. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, 2–5 June 2020; pp. 1–5. [Google Scholar]
- Tullis, T.; Albert, B. (Eds.) Chapter 7—Behavioral and Physiological Metrics. In Measuring the User Experience, 2nd ed.; Interactive Technologies, Morgan Kaufmann: Boston, MA, USA, 2013; pp. 163–186. [Google Scholar] [CrossRef]
- Fischer, B.; Ramsperger, E. Human express saccades: Extremely short reaction times of goal directed eye movements. Exp. Brain Res. 1984, 57, 191–195. [Google Scholar] [CrossRef] [PubMed]
- Purves, D.; Augustine, G.J.; Fitzpatrick, D.; Katz, L.C.; LaMantia, A.S.; McNamara, J.O.; Williams, S.M. Types of eye movements and their functions. Neuroscience 2001, 20, 361–390. [Google Scholar]
- Albert, R.; Patney, A.; Luebke, D.; Kim, J. Latency requirements for foveated rendering in virtual reality. ACM Trans. Appl. Percept. (TAP) 2017, 14, 1–13. [Google Scholar] [CrossRef]
- Van der Stigchel, S.; Meeter, M.; Theeuwes, J. Eye movement trajectories and what they tell us. Neurosci. Biobehav. Rev. 2006, 30, 666–679. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Guo, M.; Du, Y. Classification of Thyroid Ultrasound Standard Plane Images using ResNet-18 Networks. In Proceedings of the IEEE 13th International Conference on Anti-Counterfeiting, Security, and Identification, Xiamen, China, 25–27 October 2019; pp. 324–328. [Google Scholar]
- Barz, B.; Denzler, J. Deep Learning on Small Datasets without Pre-Training using Cosine Loss. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Village, CO, USA, 1–5 March 2020; pp. 1371–1380. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- Konolov, D.; Swinhoe, N.; Efremova, D.; Birtles, R.; Kusetic, M.; Hillcoat, S.; Curnock, M.; Williams, G.; Sheaves, M. Automatic Sorting of Dwarf Minke Whale Underwater Images. Information 2020, 11, 200. [Google Scholar] [CrossRef] [Green Version]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice Loss for Data-imbalanced NLP Tasks. arXiv 2020, arXiv:cs.CL/1911.02855. [Google Scholar]
- Xu, Q.; Likhomanenko, T.; Kahn, J.; Hannun, A.; Synnaeve, G.; Collobert, R. Iterative Pseudo-Labeling for Speech Recognition. arXiv 2020, arXiv:cs.CL/2005.09267. [Google Scholar]
- Li, G.; Yun, I.; Kim, J.; Kim, J. DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation. arXiv 2019, arXiv:cs.CV/1907.11357. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscape Dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. GridMask Data Augmentation. arXiv 2020, arXiv:cs.CV/2001.04086. [Google Scholar]
- Harris, E.; Marcu, A.; Painter, M.; Niranjan, M.; Prügel-Bennett, A.; Hare, J. FMix: Enhancing Mixed Sample Data Augmentation. arXiv 2021, arXiv:cs.LG/2002.12047. [Google Scholar]
- Emery, K.; Zannoli, M.; Xiao, L.; Warren, J.; Talathi, S. OpenNEEDS: A Dataset of Gaze, Head, Hand, and Scene Signals During Exploration in Open-Ended VR Environments. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, 25–29 May 2021; pp. 1–7. [Google Scholar]
Name | Camera Viewpoint | Illum. Type | Freq. (Hz) | Res. W × H (pixels) | # Subj. | Head mov. | # Seqs. | Annotations | Tasks |
---|---|---|---|---|---|---|---|---|---|
PoG [29] (2012) | On-axis (1 eye) | 1 (indoors) | 29.97 | 720 × 480 | 20 | Y | 6 per subj. | Target pixel, 3D monitor and headset locations | Elicited (fixations and smooth pursuit) |
LPW [30] (2016) | On-/off-axis (1 eye) | Continuous (indoor/ outdoor) | 95 | 720 × 480 | 22 | ? | 3 × 20 s per subj. | 2D pupil position, segment. masks (provided by [31]) | Elicited |
NVGaze [32] (2019) | On-/off-axis (2 eyes) | Constant/ varying (indoors) | 120 | 640 × 480 | 30 | N | 56 | 2D gaze direction, pupil position, blink labels | Elicited (fixations) |
GW [33] (2020) | Off-axis (2 eyes) | Continuous (indoor/ outdoor) | 120 | 640 × 480 | 19 | Y | 9 min. per subj. | Eye mov. types, 3D gaze vector, head pose | 4 real-world tasks |
Multimodal Eye Movement [34] (2021) | Off-axis (1 eye) | Continuous (indoor/ outdoor) | 25 | 192 × 144 | 19 | Y | 30 min. per subj. | Eye mov. types, eye params., segment. masks ([35]), optical vectors ([36]) | Car ride (real and simulated) |
Ours (OpenEDS 2020) | On-axis (1 eye) | 1 (VR) | 100 | 640 × 400 | 87 | N | ∼9160 | 3D gaze vector, target, point of gaze, cornea center, segment. masks | Elicited (fixations, saccades, smooth pursuit) |
Gender | Ethnicity | Age | Accessories | Number of | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Female | Male | Asian | Caucasian | Other | 21–25 | 26–30 | 31–40 | 41+ | Glasses | Makeup | Images | Seqs. | |
Train | 9 | 23 | 10 | 15 | 7 | 6 | 7 | 13 | 6 | 8 | 5 | 128K | 1280 (×46) |
Val. | 3 | 5 | 1 | 4 | 3 | 2 | 1 | 3 | 2 | 2 | 0 | 70.4K | 1280 |
Test | 12 | 28 | 16 | 17 | 7 | 10 | 10 | 14 | 6 | 11 | 5 | 352K | 6400 |
Total | 24 | 56 | 27 | 36 | 17 | 18 | 18 | 30 | 14 | 21 | 10 | 550.4K | 8960 (66,560) |
Gender | Ethnicity | Eye Color | Age | Accessories | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Female | Male | Asian | Caucasian | Other | Brown | Blue | Hazel | Green | 21–25 | 26–30 | 31–40 | 41+ | Glasses | Makeup |
27 | 47 | 31 | 30 | 13 | 50 | 14 | 4 | 6 | 17 | 15 | 25 | 16 | 21 | 14 |
Time Step | PE | p50 | p75 | p95 |
---|---|---|---|---|
1 (10 ms) | 5.28 | 4.56 | 6.73 | 11.89 |
2 (20 ms) | 5.32 | 4.57 | 6.79 | 11.99 |
3 (30 ms) | 5.37 | 4.61 | 6.83 | 12.13 |
4 (40 ms) | 5.41 | 4.63 | 6.87 | 12.30 |
5 (50 ms) | 5.46 | 4.65 | 6.92 | 12.48 |
Average | 5.37 | 4.60 | 6.83 | 12.16 |
Background | Sclera | Iris | Pupil | Average |
---|---|---|---|---|
0.971 | 0.674 | 0.882 | 0.835 | 0.841 |
Model | PE |
---|---|
1st place winner (team random_b) | 3.078 |
2nd place winner (team EyMazing) | 3.313 |
3rd place winner (team DmitryKonolov) | 3.347 |
Baseline | 5.368 |
Model | mIOU Score |
---|---|
1st place winner (team BTS Digital) | 0.9517 |
2nd place winner (team tetelias) | 0.9512 |
3rd place winner (team JHU) | 0.9502 |
Baseline | 0.841 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Palmero, C.; Sharma, A.; Behrendt, K.; Krishnakumar, K.; Komogortsev, O.V.; Talathi, S.S. OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results. Sensors 2021, 21, 4769. https://doi.org/10.3390/s21144769
Palmero C, Sharma A, Behrendt K, Krishnakumar K, Komogortsev OV, Talathi SS. OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results. Sensors. 2021; 21(14):4769. https://doi.org/10.3390/s21144769
Chicago/Turabian StylePalmero, Cristina, Abhishek Sharma, Karsten Behrendt, Kapil Krishnakumar, Oleg V. Komogortsev, and Sachin S. Talathi. 2021. "OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results" Sensors 21, no. 14: 4769. https://doi.org/10.3390/s21144769
APA StylePalmero, C., Sharma, A., Behrendt, K., Krishnakumar, K., Komogortsev, O. V., & Talathi, S. S. (2021). OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results. Sensors, 21(14), 4769. https://doi.org/10.3390/s21144769