Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing

Ye, Run Zhou; Iezzi, Raymond

doi:10.3390/jpm15010020

Open AccessArticle

Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing

by

Run Zhou Ye

and

Raymond Iezzi

^*

Department of Ophthalmology, Mayo Clinic, Rochester, MN 55905, USA

^*

Author to whom correspondence should be addressed.

J. Pers. Med. 2025, 15(1), 20; https://doi.org/10.3390/jpm15010020

Submission received: 25 November 2024 / Revised: 25 December 2024 / Accepted: 27 December 2024 / Published: 6 January 2025

(This article belongs to the Section Methodology, Drug and Device Discovery)

Download

Browse Figures

Versions Notes

Abstract

:

Purpose: Augmented reality (AR) may allow vitreoretinal surgeons to leverage microscope-integrated digital imaging systems to analyze and highlight key retinal anatomic features in real time, possibly improving safety and precision during surgery. By employing convolutional neural networks (CNNs) for retina vessel segmentation, a retinal coordinate system can be created that allows pre-operative images of capillary non-perfusion or retinal breaks to be digitally aligned and overlayed upon the surgical field in real time. Such technology may be useful in assuring thorough laser treatment of capillary non-perfusion or in using pre-operative optical coherence tomography (OCT) to guide macular surgery when microscope-integrated OCT (MIOCT) is not available. Methods: This study is a retrospective analysis involving the development and testing of a novel image-registration algorithm for vitreoretinal surgery. Fifteen anonymized cases of pars plana vitrectomy with epiretinal membrane peeling, along with corresponding preoperative fundus photographs and optical coherence tomography (OCT) images, were retrospectively collected from the Mayo Clinic database. We developed a TPU (Tensor-Processing Unit)-accelerated CNN for semantic segmentation of retinal vessels from fundus photographs and subsequent real-time image registration in surgical video streams. An iterative patch-wise cross-correlation (IPCC) algorithm was developed for image registration, with a focus on optimizing processing speeds and maintaining high spatial accuracy. The primary outcomes measured were processing speed in frames per second (FPS) and the spatial accuracy of image registration, quantified by the Dice coefficient between registered and manually aligned images. Results: When deployed on an Edge TPU, the CNN model combined with our image-registration algorithm processed video streams at a rate of 14 FPS, which is superior to processing rates achieved on other standard hardware configurations. The IPCC algorithm efficiently aligned pre-operative and intraoperative images, showing high accuracy in comparison to manual registration. Conclusions: This study demonstrates the feasibility of using TPU-accelerated CNNs for enhanced AR in vitreoretinal surgery.

Keywords:

image registration; cross-correlation; algorithm; vitreoretinal surgery; semantic segmentation; tensor processing unit; edge computing; augmented reality; surgical navigation

1. Introduction

The integration of machine learning and augmented reality (AR) into surgical practice represents a frontier in modern medicine, potentially enhancing the precision, efficiency, and outcomes of surgical procedures. AR in ophthalmic surgery, though still an emerging field with few studies [1,2,3], holds great potential for improving surgical visualization and navigation. Despite its potential, current AR research in ophthalmology predominantly focuses on surgical training [4,5,6] and therapy [7,8,9], with less emphasis on surgical navigation. Existing studies have explored applications such as OCT image augmentation [10,11], endoscopic image augmentation [12], and real-time image segmentation for deep anterior lamellar keratoplasty [13].

Key to effective AR in surgery is the accurate and fast, low-latency, real-time registration of images, particularly when accelerometer and gyroscope sensor data are not available. Image-registration algorithms are broadly categorized into intensity-based methods [14,15,16], which optimize a similarity function based on pixel values but struggle under varying illumination, and feature-based methods [17,18,19,20,21,22], which are robust but computationally demanding.

In recent years, deep learning has shown significant promise in medical image analysis, including segmentation of anatomical structures and pathology in various imaging modalities [23,24,25,26,27,28,29]. Specifically, CNNs have emerged as a powerful tool for tasks such as segmentation of retinal vessels in fundus imaging [30,31,32,33,34]. However, deploying these networks in a real-time surgical context requires substantial computational efficiency to process live video feeds.

We present herein the implementation of a Tensor Processing Unit (TPU)-accelerated convolutional neural network (CNN) to produce real-time retina vessel segmentation maps, which were then used to perform image registration using a novel Iterative Patch-wise Cross-Correlation (IPCC) algorithm. We report on the processing speeds achieved, the accuracy of the retinal vessel segmentation and image registration, and the potential clinical applications of this technology in enhancing the surgical field visualization. Furthermore, we present a pipeline that combines the TPU-accelerated CNN with an iterative cross-correlation algorithm for semantic segmentation and image registration, capable of superimposing pre-operative diagnostic images onto intraoperative video streams in real time.

2. Materials and Methods

Surgical Video Data Set

Surgical video recordings of pars plana vitrectomy with epiretinal membrane peeling, as well as the associated preoperative fundus photographs and OCTs, were retrospectively collected. Fifteen anonymized cases were retrieved from the Mayo Clinic, Rochester database from 16 November 2022 to 9 February 2023. The average duration of the videos was 49 min. Some of these cases also included preoperative visual fields, fluorescein angiograms, and RNFL and GCL thickness maps. The study was performed in compliance with the Ethical Principles for Medical Research Involving Human Subjects and approved by the Mayo Clinic institutional review board. Approval Code: 22011643. Approval Date: 29 November 2022.

General pipeline for semantic segmentation with TPU-accelerated convolutional neural networks and real-time image registration using iterative cross-correlation (Figure 1).

The overall design of the proposed pipeline is illustrated in Figure 1. First, an unquantized (float16) convolutional neural network was trained to perform semantic segmentation of retina vessels from retinal color photographs (Figure 1A). This model was quantized to eight bits (int8) using the “tensorflow.lite.TFLiteConverter” function with the default “tensorflow.lite.Optimize.DEFAULT” optimizations and compiled the quantized model for the Edge TPU device; real-time vessel segmentation of surgical video frames was then performed using the quantized model running on the Edge TPU (Figure 1B).

The Edge TPU (Coral Edge TPU, Google, LLC, Mountain View, California, USA) can be plugged into a standard laptop or desktop computer via a USB3 port to add 4 trillion operations per second (TFLOPS) of neural computation to the system. It works in parallel with the computer central processing unit (CPU) and consumes only two watts of power.

The iterative patch-wise cross-correlation (IPCC) algorithm running on an Intel i7-10750H CPU was then applied to the pre-operative vessel segmentation map and the intra-operative vessel segmentation map generated by the quantized model (Figure 1C) to yield matrix T that describes the rotational/translational/scaling transformations between the two segmentation maps. This transformation matrix was applied to all pre-operative image data to register them onto in the surgical video stream in real time (Figure 1D).

Data collection and preparation for retinal vessel segmentation

To develop a model for accurate vessel segmentation in retinal imaging, this study utilized color fundus images from the DRIVE data set [35] with manual semantic segmentation of retinal vessels. The data set comprised of 40 fundus images with a resolution of 256 × 256 pixels. Intraoperative instrument segmentation maps were created manually using 66 random frames of vitrectomy videos.

Construction of TPU-accelerated convolutional neural networks for semantic segmentation of retinal vessels.

A U-Net architecture [36] with deep-supervision was employed for vessel and instrument segmentation, with the model consisting of three U-Net layers and three channels at the first convolution (Supplementary Table S1). The U-Net model was created and trained for semantic segmentation using the DeepImageTranslator software framework as previously described [37,38].

Training was conducted over 200 epochs with a batch size of one, and data-augmentation techniques were implemented to improve the model’s robustness. The augmentation techniques included random rotations, flips, and shears, as well as random changes in brightness and contrast, as described in [37]. To assess the accuracy of the retinal vessel segmentation model, the model was tested using two other retina vessel segmentation data sets: the CHASE_DB1 [39] and the STARE data sets [40,41].

To speed up model inference speed, the final vessel/instrument segmentation model was quantized to int8 and converted to the TFLITE format and compiled for inference on the Google Coral Edge Tensor Processing Unit (TPU). To assess model inference speed on different hardware, we computed the average numbers of surgical video frames processed per second over the course of 5 min on the Coral TPU, the GeForce RTX 2060, the GeForce GTX 1060, and the Intel Core i7-10750H.

Algorithm design for iterative patch-wise cross-correlation (Figure 2).

The intuition of our image-registration algorithm derives from the way humans naturally perform the task of aligning images. In practice, a person would typically break down an image into several key areas to focus on. They would then locate these same key areas on another image that serves as a reference. Once a pair of corresponding areas is identified, they would proceed to adjust the first image by moving, rotating, and scaling it to fit over the reference image. This process of adjusting and fine-tuning is repeated with different regions of interest until the images are perfectly superimposed. Our algorithm automates this intuitive process, iteratively refining the image alignment to achieve precise registration.

Let image A represent a grayscale image of width w_A and height h_A. We first divide image A into n × n patches of width w_A/n and height h_A/n. Image B is another grayscale image of width w_B and height h_B onto which image A is to be overlayed.

Cross-correlation of each of the n × n patches P is first performed with image B as:

\begin{matrix} R (P, B) = \sum_{P^{'} {, B}^{'}} ((T (P^{'} {, B}^{'}) - \frac{1}{(w \cdot h)} \sum_{P^{″}, B^{″}} T (P^{″} {, B}^{″})) \\ \cdot (I (P + P^{'}, B + B^{'}) - \frac{1}{(w \cdot h)} \sum_{P^{″}, B^{″}} I (P + P^{″}, B + B^{″}))) \end{matrix}

Two patches p₁ and p₂ with the highest correlation coefficients, which are both higher than a threshold t, are selected. The top-left corner

\vec{C_{1 A}}

of p₁ on image A

〈C_{1 A x}, C_{1 A y}〉

and

\vec{C_{1 B}}

on image B

〈C_{1 B x}, C_{1 B y}〉,

as well as the top-left corner

\vec{C_{2 A}}

of p₂ on image A

〈C_{2 A x}, C_{2 A y}〉

and

\vec{C_{2 B}}

on image B

〈C_{2 B x}, C_{2 B y}〉,

are then used to compute the first estimate of the rotation and scaling (T_rs_,1) matrix of image A around the point (C_1Ax, C_1Ay) as well as the translation (T_t_,1) matrix:

T_{r s, 1} = [\begin{matrix} s \cdot \cos (θ) & s \cdot \sin (θ) & (1 - s \cdot \cos (θ)) \cdot C_{1 A x} - s \cdot \sin (θ) \cdot C_{1 A y} \\ - s \cdot \sin (θ) & s \cdot \cos (θ) & s \cdot \sin (θ) \cdot C_{1 A x} + (1 - s \cdot \cos (θ)) \cdot C_{1 A y} \\ 0 & 0 & 1 \end{matrix}]

T_{t, 1} = [\begin{matrix} 1 & 0 & C_{1 B x} - C_{1 A x} \\ 0 & 1 & C_{1 B y} - C_{1 A y} \\ 0 & 0 & 1 \end{matrix}]

where:

θ = {t a n}^{- 1} (\frac{(\vec{C_{2 A}} - \vec{C_{1 A}}) \times (\vec{C_{2 B}} - \vec{C_{1 B}})}{(\vec{C_{2 A}} - \vec{C_{1 A}}) \cdot (\vec{C_{2 B}} - \vec{C_{1 B}})})

Subsequently, the

T_{r s}

matrices are then applied to all patches to perform other rounds of cross-correlation such that for iteration k, matrix T_rs,k−₁ is applied to patches p₁ and p₂ to perform cross-correlation and to obtain matrices T_rs,k and T_t,k. At the end of the final iteration K, matrix

T_{K} = T_{t, K} \times T_{r s, K}

can then be applied to image A (pre-operative image) in order to register it onto image B (intraoperative frame, see Figure 2 and Figure 3).

Image registration algorithm testing

To test our algorithm, we implemented it in Python 3.9 using the OpenCV library. Source codes are available in Supplemental Materials. Spatial accuracy was assessed by calculating the Dice coefficient between the source and target vessel segmentation maps after registration. Since the retinal vessels have varying width, we skeletonized the vessel-segmentation maps to single lines before imputing them into the IPCC algorithm to maximize the spatial accuracy of image registration.

To assess the point of convergence of the iterative registration algorithm, we computed the average of the absolute relative change (

∆

) in all the elements of the matrix

T_{k}

as a function of k for k values ranging from 0 to 7 (where

t_{k, i j}

are elements of the matrix

T_{k}

):

∆ = \frac{1}{9} \sum_{j = 1}^{3} \sum_{i = 1}^{3} \frac{‖t_{k - 1, i j} - t_{k, i j}‖}{t_{k, i j}}

The processing speed of the algorithm was assessed as the average number of surgery video frames processed per second on a consumer-level Intel Core i7-10750H CPU over the course of 5 min.

3. Results

Retina vessel segmentation

Following 200 epochs of training using the DRIVE data set [35], the retina vessel segmentation U-Net model achieved a Dice coefficient of 0.796 on the CHASE_DB1 [39] and the STARE data sets [40,41] (Table 1). Figure 4A shows randomly chosen images from the CHASE_DB1 and STARE data sets with corresponding ground truth vessel segmentation and the model predicted vessel segmentation. Running the unquantized model on the Intel Core i7-10750H, GeForce GTX 1060, and GeForce RTX 2060 along with the IPCC image-registration algorithm resulted in processing speeds of 8.4, 10.4, and 11.1 frames per second (FPS), respectively.

TPU acceleration of semantic segmentation of retinal vessels with convolutional neural network

After eight-bit quantization, the vessel-segmentation model running on the Edge TPU showed minimal change in accuracy metrics for semantic segmentation (Table 1). Figure 4B shows the same images from the CHASE_DB1 and STARE data sets with corresponding ground truth vessel segmentation and the model predicted vessel segmentation by the quantized model. The processing speed of the quantized model increased to 14.4 FPS when running on the Edge TPU processor while the IPCC image-registration algorithm ran concurrently.

Figure 5 shows representative frames from surgical recordings processed by the CNN on the Edge TPU and the corresponding vessel-segmentation maps produced by the model in real time (see Supplemental Materials for the complete video).

Registration of pre-operative retinal vessel-segmentation map to intra-operative retinal vessel-segmentation maps

Image registration using retinal vessel-segmentation maps stabilized over multiple iterations of cross-correlation (Figure 6). The transformation matrix from the IPCC algorithm stabilized after three iterations with minimal adjustments thereafter (Supplemental Figure S1). Thus, for algorithm testing, a maximum of three iterations per frame was used.

To assess the spatial accuracy of our algorithm, we computed accuracy metrics for different numbers of iterations for 50 randomly chosen video frames and compared them to manually registered images (Table 2). Our results showed that the accuracy metrics did not improve significantly after three iterations. Spatial accuracy using the IPCC algorithm was similar or superior to the accuracy with manual image registration.

We found that applying the SIFT algorithm failed to identify sufficient key point pairs for homology matching (Supplemental Figure S2); this was the case using the original grayscale images and using the vessel-segmentation maps. Furthermore, the SIFT algorithm was computationally more expensive to run, resulting in an average frame rate of 7.0 FPS.

Real-time registration of pre-operative image data onto intraoperative surgical videos for augmented reality

Using the image-transformation matrices generated in real time through our retinal vessel segmentation and registration pipeline, we showed that it is possible to overlay any pre-operative image data onto the surgical video stream (see Figure 7 and Video S1). This include pre-operative microperimetry images (Figure 7A,G), Spectralis Multi-spectral fundus images (Figure 7B,H), retina thickness map (Figure 7C,I), and cross-sectional OCT image (Figure 7D,J).

As shown in Supplemental Video S1, our algorithm required few image features to accurately register the pre-operative images with nearly no incorrectly registered image and was also resistant to partial occlusion of the retinal vessels by surgical instruments.

4. Discussion

This study presents a pipeline for real-time semantic segmentation and image registration of retinal vessels in surgical videos, leveraging the capabilities of TPU-accelerated CNNs and our novel IPCC image registration algorithm. Our findings demonstrate the potential of this technology to enhance the precision and safety of vitreoretinal surgery by providing surgeons with accurate, augmented visual information.

Augmented reality (AR) in ophthalmic surgery is an emerging area with relatively few studies to date [1,2,3]. AR technology aims to enhance surgical visualization by overlaying computer-generated images onto the surgeon’s real-world view. This overlay can include preoperative diagnostic data, real-time imaging, and navigation cues, potentially increasing the accuracy and safety of surgical procedures. Most application of AR research in ophthalmology tend to focus on surgical training [4,5,6] and as therapeutic or diagnostic approaches [7,8,9] rather than for surgical navigation. Existing works include OCT image augmentation [10,11], endoscopic image augmentation [12], and real-time image segmentation for deep anterior lamellar keratoplasty [13].

Our approach’s capacity to integrate diverse pre-operative imaging modalities—such as microperimetry, multi-spectral images, FA, Color, OCT, and any pre-operative retina image annotations—into the surgical view without misregistration artifacts offers the potential to enrich the surgeon’s perception and decision-making. This integration could pave the way for advanced augmented reality applications in surgery, where multiple streams of information are blended into the operative field in real time.

This study lays the foundation for integrating real-time augmented reality into vitreoretinal surgery, with potential applications extending beyond the current scope. One significant advantage of this technology is its ability to address limitations of intraoperative OCT, such as the inability to visualize retinal structures occluded by instruments. By registering preoperative imaging data to the surgical field, these occluded structures can be reconstructed and visualized, aiding in surgical precision. Furthermore, in cases such as retinal vein occlusions, overlaying pre-occlusion imaging data could provide valuable references for guiding interventions.

Augmented reality often relies on the use of either accelerometer sensor data or image registration or a combination thereof. Without positional data of tracked objects coming from accelerometer and gyroscope sensors, image registration that is both fast and accurate becomes therefore crucial. There exist two main types of image-registration algorithms: intensity-based and feature-based registration methods. The former methods compute and optimize a similarity function (such as cross-correlation and phase correlation) based on pixel intensity values [14,15,16]. These algorithms are typically not robust to changes in illumination intensity and in cases where there is limited overlap between images. In contrast, feature-based methods work by first extracting local features (such as retinal vessels, bifurcation points), assigning them with feature descriptors, and matching all descriptors between two images. One of the most commonly used feature detectors in medical image registration is the SIFT algorithm [17,18,19]; other techniques include the detection of vessel structures [20], vessel corner points [21], and vessel bifurcation [22]. With the advancement of deep learning, newer methods have employed machine learning to generate local features [42,43,44,45,46]. Nonetheless, approaches based on feature detection are often computationally expensive to run and require image preprocessing for robust detection and matching. Furthermore, previous works on retina image registration have emphasized the use of registration algorithms for image mosaicking [47,48,49,50,51], which does not always require real-time processing speed.

Our image-registration algorithm is modeled after the human approach to matching images by identifying key areas across images and iteratively aligning them through scaling, rotating, and translating adjustments. This process is computationally optimized by our TPU-accelerated CNN, which prepares vessel-segmentation maps as inputs, minimizing the effects of illumination changes and reducing the required overlap for matching using the IPCC algorithm. This method achieves a balance between computational load and precision, handling occlusions and varying surgical conditions effectively, which is particularly relevant for real-time applications in surgery.

The successful implementation of the U-Net architecture for vessel segmentation in retinal imaging, as demonstrated by the high Dice coefficients on independent data sets, underscores the model’s generalizability and accuracy. Notably, the training on the DRIVE [35] data set and validation on CHASE_DB1 [39] and STARE [40,41] data sets ensure the model’s broad applicability across different imaging conditions. In order to maximize processing speed, we lowered the input image resolution to 256 × 256 pixel. Despite stopping training after 200 epochs, our vessel-segmentation model achieved an only slightly lower segmentation accuracy compared to previous studies using CNNs [52,53,54,55,56].

Our quantized CNN model’s minimal loss in accuracy post-quantization and subsequent performance gain on the Edge TPU highlights the practicality of deploying machine learning models in a real-time surgical setting. The significant increase in processing speed to 14 frames per second (FPS) using edge TPUs, as opposed to slower speeds on consumer-level CPUs and GPUs, represents a substantial improvement in delivering augmented reality (AR) applications for surgery. In addition, the IPCC algorithm’s lower computational demand compared to feature-detection and matching algorithms like SIFT, which only reached seven FPS, suggests that our method could provide a more fluid and less obstructive AR experience. It is notable that the edge-computing paradigm, facilitated by TPUs, provides not only speed but also the potential for enhanced data security, as sensitive patient data processing can be contained on-site without relying on cloud services.

There are several avenues for advancing AR technology in ophthalmic surgery. One such direction is the integration of micro-electromechanical systems (MEMS), such as accelerometers and gyroscopes, simultaneously into surgical tools and with sclerotomy ports. These could provide real-time feedback on tool positioning in relation to the motions of the eyeball, which, when synchronized with the visual overlay, could greatly enhance the surgeon’s spatial awareness. However, due to the size constraints of sclerotomy ports, this will necessitate innovative design and miniaturization of MEMS devices. One promising direction is the use of stereoscopic imaging to capture depth information. By accurately determining the depth of surgical instruments relative to the retina, it would be possible to project the instrument’s tip onto cross-sectional OCT images. Furthermore, the application of our approach for intraoperative guidance extends to live annotations made by surgeons. By digitally marking critical areas or points of interest, such as suspected retinal breaks with instrument tips directly within the surgical field, and tracking these annotations throughout the procedure, surgeons can maintain spatial references and operative context.

5. Conclusions

This study presents an edge-computing approach to real-time image registration in vitreoretinal surgery, highlighting the use of TPU-accelerated algorithms and a novel iterative patch-wise cross-correlation for semantic segmentation of retinal images. Our results indicate that this method achieves real-time performance, processing at 14 FPS, which is superior to conventional CPU and GPU methods. The research indicates that the combination of TPU acceleration and the IPCC algorithm can effectively address the challenge of integrating real-time augmented information into the surgical workflow. While our study focuses on vitreoretinal procedures, the implications of this technology may extend to other surgical areas in ophthalmology that could benefit from real-time image guidance.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jpm15010020/s1, Figure S1: Absolute relative change in all elements of the matrix T_k from k to k+1; Figure S2: Application of the SIFT algorithm for retina image registration; Table S1: Model architecture of the vessel and instrument segmentation U-Net with deep-supervision; Video S1: Real-time registration of pre-operative image data onto the surgical video frames.

Author Contributions

Conceptualization, R.Z.Y. and R.I.; Methodology, R.Z.Y. and R.I.; Software, R.Z.Y. and R.I.; Validation, R.Z.Y. and R.I.; Formal analysis, R.Z.Y. and R.I.; Investigation, R.Z.Y. and R.I.; Resources, R.Z.Y. and R.I.; Data curation, R.Z.Y. and R.I.; Writing—original draft, R.Z.Y. and R.I.; Writing—review & editing, R.Z.Y. and R.I.; Visualization, R.Z.Y. and R.I.; Supervision, R.I.; Project administration, R.I. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by the Mayo Foundation for Medical Education and Research.

Institutional Review Board Statement

The study was performed in compliance with the Ethical Principles for Medical Research Involving Human Subjects and approved by the Mayo Clinic institutional review board. Approval Code: 22011643. Approval Date: 29 November 2022.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Sample data generated during and/or analyzed during the current study is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Iskander, M.; Ogunsola, T.; Ramachandran, R.; McGowan, R.; Al-Aswad, L.A. Virtual reality and augmented reality in ophthalmology: A contemporary prospective. Asia-Pac. J. Ophthalmol. 2021, 10, 244. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Li, C.; Zhang, X.; Liang, W.; Chen, Y.; Ye, Y.; Lin, H. Augmented reality in ophthalmology: Applications and challenges. Front. Med. 2021, 8, 733241. [Google Scholar] [CrossRef]
Yoon, J.W.; Chen, R.E.; Kim, E.J.; Akinduro, O.O.; Kerezoudis, P.; Han, P.K.; Si, P.; Freeman, W.D.; Diaz, R.J.; Komotar, R.J. Augmented reality for the surgeon: Systematic review. Int. J. Med. Robot. Comput. Assist. Surg. 2018, 14, e1914. [Google Scholar] [CrossRef] [PubMed]
Leitritz, M.A.; Ziemssen, F.; Suesskind, D.; Partsch, M.; Voykov, B.; Bartz-Schmidt, K.U.; Szurman, G.B. Critical evaluation of the usability of augmented reality ophthalmoscopy for the training of inexperienced examiners. Retina 2014, 34, 785–791. [Google Scholar] [CrossRef]
Ropelato, S.; Menozzi, M.; Michel, D.; Siegrist, M. Augmented reality microsurgery: A tool for training micromanipulations in ophthalmic surgery using augmented reality. Simul. Healthc. 2020, 15, 122–127. [Google Scholar] [CrossRef]
Chou, J.; Kosowsky, T.; Payal, A.R.; Gonzalez, L.A.G.; Daly, M.K. Construct and face validity of the Eyesi indirect ophthalmoscope simulator. Retina 2017, 37, 1967–1976. [Google Scholar] [CrossRef]
Huang, J.; Kinateder, M.; Dunn, M.J.; Jarosz, W.; Yang, X.-D.; Cooper, E.A. An augmented reality sign-reading assistant for users with reduced vision. PLoS ONE 2019, 14, e0210630. [Google Scholar] [CrossRef] [PubMed]
Chung, S.A.; Choi, J.; Jeong, S.; Ko, J. Block-building performance test using a virtual reality head-mounted display in children with intermittent exotropia. Eye 2021, 35, 1758–1765. [Google Scholar] [CrossRef] [PubMed]
Jones, P.R.; Somoskeöy, T.; Chow-Wing-Bom, H.; Crabb, D.P. Seeing other perspectives: Evaluating the use of virtual and augmented reality to simulate visual impairments (OpenVisSim). NPJ Digit. Med. 2020, 3, 32. [Google Scholar] [CrossRef]
Roodaki, H.; Filippatos, K.; Eslami, A.; Navab, N. Introducing augmented reality to optical coherence tomography in ophthalmic microsurgery. In Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality, Fukuoka, Japan, 29 September–3 October 2015; pp. 1–6. [Google Scholar]
Tang, N.; Fan, J.; Wang, P.; Shi, G. Microscope integrated optical coherence tomography system combined with augmented reality. Opt. Express 2021, 29, 9407–9418. [Google Scholar] [CrossRef]
DeLisi, M.P.; Mawn, L.A.; Galloway Jr, R.L. Image-guided transorbital procedures with endoscopic video augmentation. Med. Phys. 2014, 41, 091901. [Google Scholar] [CrossRef]
Pan, J.; Liu, W.; Ge, P.; Li, F.; Shi, W.; Jia, L.; Qin, H. Real-time segmentation and tracking of excised corneal contour by deep neural networks for DALK surgical navigation. Comput. Methods Programs Biomed. 2020, 197, 105679. [Google Scholar] [CrossRef] [PubMed]
Saha, S.K.; Xiao, D.; Bhuiyan, A.; Wong, T.Y.; Kanagasingam, Y. Color fundus image registration techniques and applications for automated analysis of diabetic retinopathy progression: A review. Biomed. Signal Process. Control 2019, 47, 288–302. [Google Scholar] [CrossRef]
Pluim, J.P.; Maintz, J.A.; Viergever, M.A. Mutual-information-based registration of medical images: A survey. IEEE Trans. Med. Imaging 2003, 22, 986–1004. [Google Scholar] [CrossRef]
Cideciyan, A.V. Registration of ocular fundus images: An algorithm using cross-correlation of triple invariant image descriptors. IEEE Eng. Med. Biol. Mag. 1995, 14, 52–58. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Ghassabi, Z.; Shanbehzadeh, J.; Mohammadzadeh, A.; Ostadzadeh, S.S. Colour retinal fundus image registration by selecting stable extremum points in the scale-invariant feature transform detector. IET Image Process. 2015, 9, 889–900. [Google Scholar] [CrossRef]
Saha, S.K.; Xiao, D.; Frost, S.; Kanagasingam, Y. A two-step approach for longitudinal registration of retinal images. J. Med. Syst. 2016, 40, 277. [Google Scholar] [CrossRef] [PubMed]
Guo, X.; Hsu, W.; Lee, M.L.; Wong, T.Y. A tree matching approach for the temporal registration of retinal images. In Proceedings of the 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), Arlington, VA, USA, 13–15 November 2006; pp. 632–642. [Google Scholar]
Chen, J.; Smith, R.T.; Tian, J.; Laine, A.F. A novel registration method for retinal images based on local features. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 2242–2245. [Google Scholar]
Chen, L.; Huang, X.; Tian, J. Retinal image registration using topological vascular tree segmentation and bifurcation structures. Biomed. Signal Process. Control 2015, 16, 22–31. [Google Scholar] [CrossRef]
Pham, D.L.; Xu, C.; Prince, J.L. Current methods in medical image segmentation. Annu. Rev. Biomed. Eng. 2000, 2, 315–337. [Google Scholar] [CrossRef]
Forouzanfar, M.; Forghani, N.; Teshnehlab, M. Parameter optimization of improved fuzzy c-means clustering algorithm for brain MR image segmentation. Eng. Appl. Artif. Intell. 2010, 23, 160–168. [Google Scholar] [CrossRef]
Wu, W.; Chen, A.Y.; Zhao, L.; Corso, J.J. Brain tumor detection and segmentation in a CRF (conditional random fields) framework with pixel-pairwise affinity and superpixel-level features. Int. J. Comput. Assist. Radiol. Surg. 2014, 9, 241–253. [Google Scholar] [CrossRef]
Montastier, É.; Ye, R.Z.; Noll, C.; Bouffard, L.; Fortin, M.; Frisch, F.; Phoenix, S.; Guérin, B.; Turcotte, É.E.; Lewis, G.F. Increased postprandial nonesterified fatty acid efflux from adipose tissue in prediabetes is offset by enhanced dietary fatty acid adipose trapping. Am. J. Physiol.-Endocrinol. Metab. 2021, 320, E1093–E1106. [Google Scholar] [CrossRef] [PubMed]
Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep learning techniques for medical image segmentation: Achievements and challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef]
Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
Qamar, S.; Jin, H.; Zheng, R.; Ahmad, P.; Usama, M. A variant form of 3D-UNet for infant brain segmentation. Future Gener. Comput. Syst. 2020, 108, 613–623. [Google Scholar] [CrossRef]
Ilesanmi, A.E.; Ilesanmi, T.; Gbotoso, A.G. A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks. Healthc. Anal. 2023, 4, 100261. [Google Scholar] [CrossRef]
Hu, K.; Zhang, Z.; Niu, X.; Zhang, Y.; Cao, C.; Xiao, F.; Gao, X. Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function. Neurocomputing 2018, 309, 179–191. [Google Scholar] [CrossRef]
Chai, Y.; Liu, H.; Xu, J. A new convolutional neural network model for peripapillary atrophy area segmentation from retinal fundus images. Appl. Soft Comput. 2020, 86, 105890. [Google Scholar] [CrossRef]
Das, S.; Kharbanda, K.; Suchetha, M.; Raman, R.; Dhas, E. Deep learning architecture based on segmented fundus image features for classification of diabetic retinopathy. Biomed. Signal Process. Control 2021, 68, 102600. [Google Scholar] [CrossRef]
Dasgupta, A.; Singh, S. A fully convolutional neural network based structured prediction approach towards the retinal vessel segmentation. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, Australia, 18–21 April 2017; pp. 248–251. [Google Scholar]
Staal, J.; Abramoff, M.D.; Niemeijer, M.; Viergever, M.A.; van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. pp. 234–241. [Google Scholar]
Ye, R.Z.; Noll, C.; Richard, G.; Lepage, M.; Turcotte, E.E.; Carpentier, A.C. DeepImageTranslator: A free, user-friendly graphical interface for image translation using deep-learning and its applications in 3D CT image analysis. SLAS Technol. 2022, 27, 76–84. [Google Scholar] [CrossRef]
Ye, E.Z.; Ye, E.H.; Bouthillier, M.; Ye, R.Z. DeepImageTranslator V2: Analysis of multimodal medical images using semantic segmentation maps generated through deep learning. bioRxiv 2021. [Google Scholar] [CrossRef]
Henry, H.Y.; Feng, X.; Wang, Z.; Sun, H. MixModule: Mixed CNN kernel module for medical image segmentation. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1508–1512. [Google Scholar]
Hoover, A.; Kouznetsova, V.; Goldbaum, M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans. Med. Imaging 2000, 19, 203–210. [Google Scholar] [CrossRef] [PubMed]
Hoover, A.; Goldbaum, M. Locating the optic nerve in a retinal image using the fuzzy convergence of the blood vessels. IEEE Trans. Med. Imaging 2003, 22, 951–958. [Google Scholar] [CrossRef] [PubMed]
Fischer, P.; Dosovitskiy, A.; Brox, T. Descriptor matching with convolutional neural networks: A comparison to sift. arXiv 2014, arXiv:1405.5769. [Google Scholar]
Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. Lift: Learned invariant feature transform. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VI 14. pp. 467–483. [Google Scholar]
Ono, Y.; Trulls, E.; Fua, P.; Yi, K.M. LF-Net: Learning local features from images. Adv. Neural Inf. Process. Syst. 2018, 31, 6237–6247. [Google Scholar]
Truong, P.; Apostolopoulos, S.; Mosinska, A.; Stucky, S.; Ciller, C.; Zanet, S.D. Glampoints: Greedily learned accurate match points. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10732–10741. [Google Scholar]
Liu, J.; Li, X.; Wei, Q.; Xu, J.; Ding, D. Semi-supervised Keypoint Detector and Descriptor for Retinal Image Matching. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 593–609. [Google Scholar]
Aruna, K.; Anil, V.S.; Anand, A.; Jaysankar, A.; Venugopal, A.; Nisha, K.; Sreelekha, G. Image Mosaicing for Neonatal Fundus Images. In Proceedings of the 2021 8th International Conference on Smart Computing and Communications (ICSCC), Kochi, Kerala, India, 1–3 July 2021; pp. 100–105. [Google Scholar]
Richa, R.; Linhares, R.; Comunello, E.; Von Wangenheim, A.; Schnitzler, J.-Y.; Wassmer, B.; Guillemot, C.; Thuret, G.; Gain, P.; Hager, G. Fundus image mosaicking for information augmentation in computer-assisted slit-lamp imaging. IEEE Trans. Med. Imaging 2014, 33, 1304–1312. [Google Scholar] [CrossRef]
Köhler, T.; Heinrich, A.; Maier, A.; Hornegger, J.; Tornow, R.P. Super-resolved retinal image mosaicing. In Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Prague, Czech Republic, 13–16 April 2016; pp. 1063–1067. [Google Scholar]
De Zanet, S.; Rudolph, T.; Richa, R.; Tappeiner, C.; Sznitman, R. Retinal slit lamp video mosaicking. Int. J. Comput. Assist. Radiol. Surg. 2016, 11, 1035–1041. [Google Scholar] [CrossRef] [PubMed]
Feng, X.; Cai, G.; Gou, X.; Yun, Z.; Wang, W.; Yang, W. Retinal mosaicking with vascular bifurcations detected on vessel mask by a convolutional network. J. Healthc. Eng. 2020, 2020, 7156408. [Google Scholar] [CrossRef] [PubMed]
Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl.-Based Syst. 2019, 178, 149–162. [Google Scholar] [CrossRef]
Chen, C.; Chuah, J.H.; Ali, R.; Wang, Y. Retinal vessel segmentation using deep learning: A review. IEEE Access 2021, 9, 111985–112004. [Google Scholar] [CrossRef]
Chala, M.; Nsiri, B.; El yousfi Alaoui, M.H.; Soulaymani, A.; Mokhtari, A.; Benaji, B. An automatic retinal vessel segmentation approach based on Convolutional Neural Networks. Expert Syst. Appl. 2021, 184, 115459. [Google Scholar] [CrossRef]
Jiang, Y.; Liang, J.; Cheng, T.; Lin, X.; Zhang, Y.; Dong, J. MTPA_Unet: Multi-scale transformer-position attention retinal vessel segmentation network joint transformer and CNN. Sensors 2022, 22, 4592. [Google Scholar] [CrossRef]
Deng, X.; Ye, J. A retinal blood vessel segmentation based on improved D-MNet and pulse-coupled neural network. Biomed. Signal Process. Control 2022, 73, 103467. [Google Scholar] [CrossRef]

Figure 1. General pipeline for semantic segmentation with TPU-accelerated CNN and real-time image registration. Initially, a float16 convolutional neural network (CNN) was trained for semantic segmentation of retinal vessels from color photographs (A). This CNN was then quantized to eight bits (int8) and adapted for the Edge TPU to perform real-time vessel segmentation in surgical videos (B). The iterative patch-wise cross-correlation (IPCC) algorithm, operating on the CPU, utilized these segmentations to create a transformation matrix (C), which was then applied to align pre-operative images with the surgical video stream in real time (D).

Figure 2. Algorithm design for iterative patch-wise cross-correlation. Image A is divided into n × n patches and overlaid onto Image B. Cross-correlation is performed between each patch of Image A and Image B. The patches with the highest correlation coefficients are used to compute rotation, scaling, and translation matrices for Image A, aligning it with Image B. This alignment process involves iterative adjustments to the transformation matrices, refining the overlay of Image A onto Image B through successive rounds of cross-correlation. The final transformation matrix, obtained after K iterations, precisely registers the pre-operative image (Image A) onto the intraoperative frame (Image B).

Figure 3. Pseudocode for the iterative patch-wise cross-correlation algorithm.

Figure 4. Retina image segmentation using the unquantized and quantized neural networks. Images from the CHASE_DB1 and STARE data sets with corresponding ground truth vessel segmentation and the model predicted vessel segmentation by the unquantized (A) and quantized (B) models.

Figure 5. Frames from surgical recordings processed by the CNN on the Edge TPU and the corresponding predicted vessel-segmentation maps.

Figure 6. This figure demonstrates the iterative registration of pre-operative retina-thickness map to intra-operative surgical frame (A). The stabilization of the transformation matrix is shown over multiple iterations of the Iterative Patch-wise Cross-Correlation (IPCC) algorithm. Panel (B) displays the initial alignment after the first iteration (k = 1), where the pre-operative map shows significant misalignment with the intra-operative map. Panels (C–E) show the progressive alignment after two, three, and four iterations, respectively, with Panels (E,F) showing minimal adjustments and optimal registration achieved by the third iteration.

Figure 7. Shown here is the integration of various pre-operative diagnostic imaging modalities into the intra-operative surgical video stream in real time using the proposed retinal vessel segmentation and registration pipeline. Panels (A–D) represent different types of pre-operative imaging data before registration: (A) microperimetry images, (B) Spectralis multi-spectral fundus images, (C) retina thickness maps, and (D) cross-sectional optical coherence tomography (OCT) images. Panels (E) and (F) represent the original surgical frame and the vessel segmentation result from the quantized U-Net model, respectively. Panels (G,H) display the corresponding intra-operative surgical frames with the registered overlays: surgical frame overlayed with microperimetry images (G), Spectralis multi-spectral fundus images (H), retina thickness maps (I), and cross-sectional optical coherence tomography (OCT) images (J). The overlays maintain accurate alignment even under conditions such as partial occlusion of retinal vessels by surgical instruments. This capability ensures that surgeons can access critical diagnostic information directly within the operative view.

Table 1. Accuracy metrics for vessel segmentation of unquantized and quantized models on testing data sets.

	Unquantized Model	Quantized Model
Dice Coefficient	0.795836	0.794072
Accuracy	0.947073	0.94464
Precision	0.823066	0.843703
Recall	0.782214	0.764726
F1 Score	0.795836	0.794072
Jaccard Index	0.702643	0.695848
Specificity	0.718176	0.775183
IoU	0.702643	0.695848
Cohen Kappa	0.572046	0.593596

Summary of retina vessel segmentation performance metrics, comparing the unquantized and quantized Convolutional Neural Network models across different data sets. IoU: Intersection over Union.

Table 2. Accuracy metrics for image registration as a function of the number of iterations (k) compared to manual image registration (M).

k	Dice Coefficient	Accuracy	Precision	Recall	F1 Score	Jaccard Index	Specificity	IoU	Cohen Kappa
1	0.549	0.937	0.556	0.545	0.549	0.502	0.6	0.502	0.08
2	0.678	0.951	0.678	0.683	0.678	0.591	0.842	0.591	0.291
3	0.71	0.954	0.705	0.721	0.71	0.619	0.866	0.619	0.331
4	0.74	0.957	0.734	0.751	0.74	0.643	0.908	0.643	0.376
5	0.732	0.957	0.728	0.741	0.732	0.636	0.903	0.636	0.367
6	0.71	0.954	0.705	0.721	0.71	0.619	0.866	0.619	0.331
7	0.74	0.957	0.734	0.751	0.74	0.643	0.908	0.643	0.376
M	0.611	0.951	0.699	0.556	0.611	0.547	0.854	0.547	0.253

Spatial accuracy metrics for image registration using the Iterative Patch-wise Cross-Correlation algorithm, comparing different numbers of iterations to manual registration across 50 randomly chosen video frames. IoU: Intersection over Union.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, R.Z.; Iezzi, R. Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing. J. Pers. Med. 2025, 15, 20. https://doi.org/10.3390/jpm15010020

AMA Style

Ye RZ, Iezzi R. Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing. Journal of Personalized Medicine. 2025; 15(1):20. https://doi.org/10.3390/jpm15010020

Chicago/Turabian Style

Ye, Run Zhou, and Raymond Iezzi. 2025. "Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing" Journal of Personalized Medicine 15, no. 1: 20. https://doi.org/10.3390/jpm15010020

APA Style

Ye, R. Z., & Iezzi, R. (2025). Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing. Journal of Personalized Medicine, 15(1), 20. https://doi.org/10.3390/jpm15010020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI