Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Enhanced U-Net with GridMask (EUGNet) Architecture
- Deep Contextual Encoder: To capture distant contextual information, which the traditional U-Net might miss, our encoder is deepened and incorporates dilated convolutions. This enhancement allows for a broader receptive field without a significant increase in computational complexity.
- Residual Connections: To mitigate the loss of fine-grained spatial details during the downsampling and upsampling processes, we have integrated residual connections between corresponding encoder and decoder layers. This integration ensures the preservation of spatial information, aiding in more accurate segmentation output reconstruction.
- Class Balancing Loss: Considering the frequent challenge of imbalanced class distributions in medical image analysis, our architecture employs a class-balancing loss function. This adjustment ensures that the model remains unbiased towards the majority class, providing equal importance to all classes during training.
- Adaptive Feature Fusion: To better handle objects of irregular shapes, we introduce an adaptive feature fusion mechanism within the decoder. This mechanism adaptively weighs features from the encoder and the upsampled features from the preceding decoder layer, allowing the model to focus on the most pertinent features for segmentation.
- GridMask Augmentation Module: The GridMask technique is directly integrated into our training pipeline. Before the images are input into the encoder, they undergo the GridMask module, ensuring the model consistently trains with augmented data, enhancing its robustness and reducing overfitting tendencies.
- Efficient Implementation: To address U-Net’s computational demands, our architecture employs depthwise separable convolutions where feasible. This approach reduces the parameter count without compromising the model’s learning capacity.
- Multi-Modal Fusion: For tasks that involve multiple data modalities, EUGNet introduces a fusion layer post-encoder. This layer is designed to effectively fuse features from different modalities before they are passed to the decoder. Figure 1 depicts a visual representation of EUGNet.
2.2. GridMask Algorithm
2.3. Data Collection for Algorithm Evaluation
- (1)
- Da Vinci robotic (DVR) dataset [55]: The training set contains four ex vivo 45 s videos. The testing set consists of four 15 s and two 60 s videos. The test video features contain two instruments. Articulated motions are present in all the videos. The ground truth masks are automatically generated using joint encoder information and forward kinematics. Hand-eye calibration errors are manually corrected. All the videos have been recorded with the da Vinci Research Kit (dVRK) open-source platform [56]. The frames have a resolution of 720 × 576, and the videos run at 25 frames per second. This means that we have a total of (60 + 15) × 25 = 1875 frames (images).
- (2)
- We obtained the recorded videos for testing our algorithm from open sources on the Internet, including the U.S. National Library of Medicine [57] (video links are available upon request). The videos showed various surgical procedures, such as midline lobectomy, right superior line lobectomy, thoracotomy, thoracoscopic lung surgery, and prostatectomy. Each video showed splash-like bleeding. The frames have a resolution of 720 × 576, and the videos run at 25 frames per second. The total duration of these videos is 2 min, which means 2 × 60 × 25 = 3000 frames (images).
- (3)
- The binary segmentation EndoVis 17 [58] dataset, comprising 600 images, was utilized for both testing and training purposes. It consists of 10 sequences from abdominal porcine procedures recorded with da Vinci Xi systems. The dataset was curated by selecting active sequences that exhibited substantial instrument motion and visibility, with 300 frames sampled at a 1 Hz rate from each procedure. Frames where the instrument motion was absent for an extended period were manually excluded to maintain a consistent sequence of 300 frames. For the purpose of training, the first 225 frames from 8 sequences were made available, while the remaining 75 frames of these sequences were reserved for testing. Additionally, 2 sequences, each with a complete set of 300 frames, were exclusively allocated for testing.
2.4. Baseline Method and Evaluation Protocol
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gagner, M. Laparoscopic adrenalectomy in Cushing’s syndrome and pheochromocytoma. N. Engl. J. Med. 1992, 327, 1033. [Google Scholar] [PubMed]
- Berger, R.A.; Jacobs, J.J.; Meneghini, R.M.; Della Valle, C.; Paprosky, W.; Rosenberg, A.G. Rapid rehabilitation and recovery with minimally invasive total hip arthroplasty. Clin. Orthop. Relat. Res. 2004, 429, 239–247. [Google Scholar] [CrossRef] [PubMed]
- Kehlet, H.; Wilmore, D.W. Evidence-based surgical care and the evolution of fast-track surgery. Ann. Surg. 2008, 248, 189–198. [Google Scholar] [CrossRef] [PubMed]
- Darzi, A.; Mackay, S. Recent advances in minimal access surgery. Bmj 2002, 324, 31–34. [Google Scholar] [CrossRef] [PubMed]
- Maurus, C.F.; Schäfer, M.; Müller, M.K.; Clavien, P.-A.; Weber, M. Laparoscopic versus open splenectomy for nontraumatic diseases. World J. Surg. 2008, 32, 2444–2449. [Google Scholar] [CrossRef] [PubMed]
- Khorgami, Z.; Haskins, I.N.; Aminian, A.; Andalib, A.; Rosen, M.J.; Brethauer, S.A.; Schauer, P.R. Concurrent ventral hernia repair in patients undergoing laparoscopic bariatric surgery: A case-matched study using the National Surgical Quality Improvement Program Database. Surg. Obes. Relat. Dis. 2017, 13, 997–1002. [Google Scholar] [CrossRef] [PubMed]
- Pollard, J.S.; Fung, A.K.-Y.; Ahmed, I. Are natural orifice transluminal endoscopic surgery and single-incision surgery viable techniques for cholecystectomy? J. Laparoendosc. Adv. Surg. Tech. 2012, 22, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Stefanidis, D.; Fanelli, R.D.; Price, R.; Richardson, W.; Committee, S.G. SAGES guidelines for the introduction of new technology and techniques. Surg. Endosc. 2014, 28, 2257–2271. [Google Scholar] [CrossRef]
- Gifari, M.W.; Naghibi, H.; Stramigioli, S.; Abayazid, M. A review on recent advances in soft surgical robots for endoscopic applications. Int. J. Med. Robot. Comput. Assist. Surg. 2019, 15, e2010. [Google Scholar] [CrossRef]
- Somashekhar, S.; Acharya, R.; Saklani, A.; Parikh, D.; Goud, J.; Dixit, J.; Gopinath, K.; Kumar, M.V.; Bhojwani, R.; Nayak, S. Adaptations and safety modifications to perform safe minimal access surgery (MIS: Laparoscopy and Robotic) during the COVID-19 pandemic: Practice modifications expert panel consensus guidelines from Academia of Minimal Access Surgical Oncology (AMASO). Indian J. Surg. Oncol. 2021, 12, 210–220. [Google Scholar] [CrossRef]
- Vitiello, V.; Lee, S.-L.; Cundy, T.P.; Yang, G.-Z. Emerging robotic platforms for minimally invasive surgery. IEEE Rev. Biomed. Eng. 2012, 6, 111–126. [Google Scholar] [CrossRef] [PubMed]
- Cohn, L.H.; Adams, D.H.; Couper, G.S.; Bichell, D.P.; Rosborough, D.M.; Sears, S.P.; Aranki, S.F. Minimally invasive cardiac valve surgery improves patient satisfaction while reducing costs of cardiac valve replacement and repair. Ann. Surg. 1997, 226, 421. [Google Scholar] [CrossRef] [PubMed]
- Link, R.E.; Bhayani, S.B.; Kavoussi, L.R. A prospective comparison of robotic and laparoscopic pyeloplasty. Ann. Surg. 2006, 243, 486. [Google Scholar] [CrossRef] [PubMed]
- Schijven, M.; Jakimowicz, J.; Broeders, I.; Tseng, L. The Eindhoven laparoscopic cholecystectomy training course—Improving operating room performance using virtual reality training: Results from the first EAES accredited virtual reality trainings curriculum. Surg. Endosc. Other Interv. Tech. 2005, 19, 1220–1226. [Google Scholar] [CrossRef] [PubMed]
- Blavier, A.; Gaudissart, Q.; Cadière, G.-B.; Nyssen, A.-S. Comparison of learning curves and skill transfer between classical and robotic laparoscopy according to the viewing conditions: Implications for training. Am. J. Surg. 2007, 194, 115–121. [Google Scholar] [CrossRef] [PubMed]
- Haidegger, T.; Speidel, S.; Stoyanov, D.; Satava, R.M. Robot-assisted minimally invasive surgery—Surgical robotics in the data age. Proc. IEEE 2022, 110, 835–846. [Google Scholar] [CrossRef]
- Maier-Hein, L.; Eisenmann, M.; Sarikaya, D.; März, K.; Collins, T.; Malpani, A.; Fallert, J.; Feussner, H.; Giannarou, S.; Mascagni, P. Surgical data science–from concepts toward clinical translation. Med. Image Anal. 2022, 76, 102306. [Google Scholar] [CrossRef]
- Bouarfa, L.; Akman, O.; Schneider, A.; Jonker, P.P.; Dankelman, J. In-vivo real-time tracking of surgical instruments in endoscopic video. Minim. Invasive Ther. Allied Technol. 2012, 21, 129–134. [Google Scholar] [CrossRef]
- Mamone, V.; Viglialoro, R.M.; Cutolo, F.; Cavallo, F.; Guadagni, S.; Ferrari, V. Robust Laparoscopic Instruments Tracking Using Colored Strips. In Proceedings of the Augmented Reality, Virtual Reality, and Computer Graphics: 4th International Conference, AVR 2017, Ugento, Italy, 12–15 June 2017; Proceedings, Part II 4. Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Sorriento, A.; Porfido, M.B.; Mazzoleni, S.; Calvosa, G.; Tenucci, M.; Ciuti, G.; Dario, P. Optical and electromagnetic tracking systems for biomedical applications: A critical review on potentialities and limitations. IEEE Rev. Biomed. Eng. 2019, 13, 212–232. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Q.; Liu, Z.; Gu, L. Visual detection and tracking algorithms for minimally invasive surgical instruments: A comprehensive review of the state-of-the-art. Robot. Auton. Syst. 2022, 149, 103945. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 2016, 316, 2402–2410. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.; Felsberg, M. Accurate Scale Estimation for Robust Visual Tracking. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; BMVA Press: Durham, UK, 2014. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Stanford, CA, USA, 2016. [Google Scholar]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Kamnitsas, K.; Ledig, C.; Newcombe, V.F.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef] [PubMed]
- Papp, D.; Elek, R.N.; Haidegger, T. Surgical Tool Segmentation on the Jigsaws Dataset for Autonomous Image-Based Skill Assessment. In Proceedings of the 2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC), Reykjavík, Iceland, 6–9 July 2022; IEEE: Reykjavík, Iceland, 2022. [Google Scholar]
- Ahmidi, N.; Tao, L.; Sefati, S.; Gao, Y.; Lea, C.; Haro, B.B.; Zappella, L.; Khudanpur, S.; Vidal, R.; Hager, G.D. A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans. Biomed. Eng. 2017, 64, 2025–2041. [Google Scholar] [CrossRef] [PubMed]
- Funke, I.; Mees, S.T.; Weitz, J.; Speidel, S. Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 1217–1225. [Google Scholar] [CrossRef] [PubMed]
- Lajkó, G.; Nagyne Elek, R.; Haidegger, T. Endoscopic image-based skill assessment in robot-assisted minimally invasive surgery. Sensors 2021, 21, 5412. [Google Scholar] [CrossRef] [PubMed]
- Nema, S.; Vachhani, L. Surgical instrument detection and tracking technologies: Automating dataset labeling for surgical skill assessment. Front. Robot. AI 2022, 9, 1030846. [Google Scholar] [CrossRef] [PubMed]
- Jin, A.; Yeung, S.; Jopling, J.; Krause, J.; Azagury, D.; Milstein, A.; Fei-Fei, L. Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: Lake Tahoe, NV, USA, 2018. [Google Scholar]
- Attia, M.; Hossny, M.; Nahavandi, S.; Asadi, H. Surgical Tool Segmentation Using a Hybrid Deep CNN-RNN Auto Encoder-Decoder. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; IEEE: Banff, AB, Canada, 2017. [Google Scholar]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Falk, T.; Mai, D.; Bensch, R.; Çiçek, Ö.; Abdulkadir, A.; Marrakchi, Y.; Böhm, A.; Deubner, J.; Jäckel, Z.; Seiwald, K. U-Net: Deep learning for cell counting, detection, and morphometry. Nat. Methods 2019, 16, 67–70. [Google Scholar] [CrossRef]
- Iglovikov, V.; Shvets, A. Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation. arXiv 2018, arXiv:1801.05746. [Google Scholar]
- Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. Doubleu-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 28–30 July 2020; IEEE: Rochester, MN, USA, 2020. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Hasan, S.K.; Linte, C.A. U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; IEEE: Berlin, Germany, 2019. [Google Scholar]
- Siddique, N. U-Net Based Deep Learning Architectures for Object Segmentation in Biomedical Images. Doctoral Dissertation, Purdue University Graduate School, West Lafayette, IN, USA, 2021. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, 14 September 2017; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.-M.; Larochelle, H. Brain tumor segmentation with deep neural networks. Med. Image Anal. 2017, 35, 18–31. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Singh, K.K.; Yu, H.; Sarmasi, A.; Pradeep, G.; Lee, Y.J. Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond. arXiv 2018, arXiv:1811.02545. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. Gridmask data augmentation. arXiv 2020, arXiv:2001.04086. [Google Scholar]
- Li, P.; Li, X.; Long, X. Fencemask: A data augmentation approach for pre-extracted image features. arXiv 2020, arXiv:2006.07877. [Google Scholar]
- Pakhomov, D.; Premachandran, V.; Allan, M.; Azizian, M.; Navab, N. Deep Residual Learning for Instrument Segmentation in Robotic Surgery. In Proceedings of the Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 13 October 2019; Proceedings 10. Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Kazanzides, P.; Chen, Z.; Deguet, A.; Fischer, G.S.; Taylor, R.H.; DiMaio, S.P. An Open-Source Research Kit for the da Vinci® Surgical System. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; IEEE: Hong Kong, China, 2014. [Google Scholar]
- Novellis, P.; Jadoon, M.; Cariboni, U.; Bottoni, E.; Pardolesi, A.; Veronesi, G. Management of robotic bleeding complications. Ann. Cardiothorac. Surg. 2019, 8, 292. [Google Scholar] [CrossRef]
- Allan, M.; Shvets, A.; Kurmann, T.; Zhang, Z.; Duggal, R.; Su, Y.-H.; Rieke, N.; Laina, I.; Kalavakonda, N.; Bodenstedt, S. 2017 robotic instrument segmentation challenge. arXiv 2019, arXiv:1902.06426. [Google Scholar]
Network | Inference Time (ms/fps) | Balanced Accuracy (fg.) | Mean IoU | Mean DSC |
---|---|---|---|---|
U-Net without GridMask | 62.1/16.1 | 82.5% | 78.2% | 84.2% |
U-Net with GridMask | 34.2/29.2 | 86.3% | 80.6% | 89.5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Daneshgar Rahbar, M.; Mousavi Mojab, S.Z. Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation. J. Imaging 2023, 9, 282. https://doi.org/10.3390/jimaging9120282
Daneshgar Rahbar M, Mousavi Mojab SZ. Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation. Journal of Imaging. 2023; 9(12):282. https://doi.org/10.3390/jimaging9120282
Chicago/Turabian StyleDaneshgar Rahbar, Mostafa, and Seyed Ziae Mousavi Mojab. 2023. "Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation" Journal of Imaging 9, no. 12: 282. https://doi.org/10.3390/jimaging9120282
APA StyleDaneshgar Rahbar, M., & Mousavi Mojab, S. Z. (2023). Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation. Journal of Imaging, 9(12), 282. https://doi.org/10.3390/jimaging9120282