1. Introduction
The analysis of urinary stone composition is one of the most important factors in the treatment of urolithiasis [
1]. In both intraoperative and postoperative management, the composition of urinary calculi plays an important role. For example, during operations, in order to efficiently break the stone, it is important to properly select the laser energy and frequency according to the stone’s composition and size [
2]. When it comes to postoperative management, depending on the stone’s components, there are various diet control management strategies, behavioral therapies, and pharmacotherapies that optimize metabolic factors and reduce the urinary supersaturation of stone composition to lower the urinary calculi recurrence rate [
3].
The methods for analyzing the composition of urinary calculi are optic polarizing microscopy, scanning electron microscopy, infrared spectroscopy, X-ray powder diffraction, elementary distribution analysis, and so on. Among these methods, Fourier transform infrared spectroscopy (FTIRS) is an efficient, reliable, accurate, and rapid method and, currently, one of the most widely used [
4,
5]. However, it takes several weeks to receive FTIRS results, and no test can predict urinary stone composition intraoperatively or immediately after surgery.
The field of artificial intelligence is developing dramatically, and neural networks are going beyond human recognition. Neural networks demonstrate excellent performance, particularly in handling large-scale data processing, complex pattern recognition, and achieving high accuracy and consistency. Recent studies have predicted urinary stone components using neural networks and digital images. Kristian et al. reported favorable results in identifying kidney stone composition from digital photographs taken in vitro using a deep convolutional neural network (CNN) [
6]. Furthermore, Estrade et al. showed decent results using intraoperative ureterorenoscopic (Olympus URF-V CCD sensor) digital images and endoscopic morphological criteria, which the authors proposed in a previous study [
7,
8]. Thus far, studies for autonomic recognition have utilized high-quality images, and no existing studies have used single-use flexible ureteroscopic (fURS) images. We investigated whether a deep CNN can also show decent results in predicting urinary stone composition even in single-use fURS images with relatively low resolution.
2. Materials and Methods
2.1. Study Design
This study was approved by the Institutional Review Board of Severance Hospital, Yonsei University Health System (no. 4-2022-0797). We retrospectively used surgical videos of ureterorenoscopic lithotripsy performed by a single surgeon (JYL) between January 2018 and December 2021. The ureterorenoscope used in this study was the LithoVue single-use flexible ureteroscope (Boston Scientific, Boston, MA, USA). From the photographs captured during surgery, one picture was chosen for each stone that met the pre-defined conditions. These images went through minimal image pre-processing to get rid of unnecessary blank spaces and trademarks. The results of the urinary calculi composition analysis were obtained through FTIRS and used to divide the photographs into two groups: the Calcium group and the Non-calcium group. The pre-processed images and the classified FTIRS results were used to train the CNN model.
2.2. Image Standardization and Pre-Processing
fURS images are affected by various factors, such as who the surgeon was and what devices were used. Therefore, image standardization is one of the key factors for decent results in this research. Each picture should include the entire surface of the stone. A single image was selected for each stone. Cases with poor visibility because of clots or debris and cases in which proper stone images could not be obtained due to video recording errors were excluded. Cases with multiple FTIRS values resulting from multi-location stones were excluded because there was no one-to-one match between the results and the stone. As a result, only cases that exactly matched the image and the stone composition analysis results were included in this study. Of the 506 total cases, 207 were finally included in this study. Regarding the bias due to differences in equipment, LithoVue has an advantage in that it has its own workstation platform. Reusable digital fURS cameras require separate workstations and light source equipment, and the choice of workstation can impact the quality of images. On the other hand, with LithoVue, it is possible to minimize the bias caused by the difference in additional equipment.
Image pre-processing was minimized in this study. In the obtained images, black margins and trademarks were deleted. Other than that, no additional processes were applied. We did not marginate the stone, mark the renal calyx, or comment at all, even if a part of the guidewire was visible in the image. The whole inclusion and exclusion process and image pre-processing are shown in
Figure 1.
2.3. Classification of the Urinary Calculi
For each patient, the FTIRS results after surgery were collected. In this study, as a preliminary study of autonomic recognition, we tried to simplify the classification criteria given that the image quality was somewhat inferior because the images were taken retrospectively. Calcium oxalate is the most common component of urolithiasis. We hypothesized that the hardness of the stone may vary and that the cracking pattern of the stone during laser fragmentation may be different depending on the presence or absence of calcium. The endoscopic morphology classification introduced by Estrade et al. in 2021 noted a difference in morphology depending on the presence of calcium oxalate [
8]. For these reasons, the FTIRS results were divided into two groups according to whether they contained any calcium oxalate (the Calcium group) or none (the Non-calcium group). There were 175 cases in the Calcium group and 32 cases in the Non-calcium group.
2.4. Convolutional Neural Network Model Building
CNNs were first introduced by Yann Lecun in 1989 and are now mainstream in neural network research using images [
9]. Images as input data are huge, and not all areas of the data are important for classification; instead, only a specific part of the data is important, and that feature may appear anywhere in the image. Therefore, in order to use image data, a means of filtering features from huge amounts of data is required. Since CNNs extract features from image data with convolution kernels, they have an advantage in processing image data. Transfer learning is a method that uses a model that has been pre-trained and verified with high-quality data, and it can efficiently perform learning tasks with small and relatively low-quality data [
10]. There are various pre-trained models. Of these, Resnet is currently one of the most widely used CNN structures. Resnet is a model that enables better network optimization through residual learning and shortcut connections [
11]. In this study, we chose the transfer learning method and Resnet-18 as the pre-trained model. By applying the well-trained network from Resnet-18 to the target domain, only the new classifier layers need to be trained instead of all layers. Therefore, an advantage of transfer learning is that it can efficiently perform learning with small and relatively low-quality data. The entire CNN model training structure is shown in
Figure 2.
The whole dataset was divided into a training set, a validation set, and a test set. To solve the data imbalance problem between the Calcium group and the Non-calcium group, images from the Non-calcium group were augmented to achieve the same number of images as the Calcium group. Among the 207 images, 22 were first designated as the test set, and then augmentation for the Non-calcium group was performed. Moreover, the remaining data were randomly divided into the train set and validation set in an 8:2 ratio, respectively. As a result, the training dataset included 163 images, and the validation set included 22 images. In the train and validation sets, there were 141 and 17 images from the Calcium group and 22 and 5 images from the Non-calcium group, respectively. Since the data imbalance between these two groups could distort the training process of the model, we performed three-fold data augmentation for the Calcium group and eight-fold data augmentation for the Non-calcium group. An image rotation maneuver was used for the Calcium group, and both image rotation and image flipping maneuvers were used for the Non-calcium group.
In one epoch of the training process, the model conducts model training with the train dataset and then performs an intermediate test with the validation dataset to calculate the error before propagating it back to proceed with the next training. There were seven epochs in total. Following the training, the model’s performance was finally tested with the test data. The Adam optimizer was used to optimize the model.
2.5. Localization Heat Maps
After building the model and completing the training, we plotted localization heat maps to analyze which part of the image had a significant influence on the decision process of the model. The gradient-weighted class activation mapping (Grad-CAM) method was used [
12]. Localization heat maps were made for a total of 22 test set images. We marginated the stone in the image and quantitatively analyzed it by comparing it with the distribution of the hot spots. The images were classified into two groups depending on whether the hot spots were more or less evenly distributed within the stone.
4. Discussion
The field of artificial intelligence is progressing rapidly. In the medical field, research on artificial intelligence is being actively conducted, and autonomic recognition of urolithiasis is an emerging topic. If autonomic recognition of urolithiasis is developed enough to be commercialized in the future, various aspects of the treatment guidelines for urolithiasis can be changed. For example, during ureteroscopic lithotripsy, the laser intensity can be pre-adjusted before laser firing by predicting the composition of the calculi immediately upon discovering the stone in the ureteroscopic endoscope. As a result, more efficient and faster lithotripsy may become possible. In addition, because dietary changes and behavioral therapy can be applied immediately after surgery, various stone-forming factors can be minimized. In addition, experienced surgeons can have a rough ability to predict the composition of urinary stones, but currently, there is no objective and quantitative method for such predictions. Autonomic recognition, however, allows any user, regardless of expertise, to predict the composition of urinary caculi objectively and quantitatively. Through more elaborately planned prospective studies with more high-quality data, autonomic recognition of urolithiasis can become a reality.
The treatment methods for urolithiasis are becoming increasingly diverse and advanced. For example, endoscopic combined intrarenal surgery, which combines percutaneous nephrolithotomy with retrograde ureteroscopy, is being widely used as it shows higher stone-free rates for complex stones compared with traditional percutaneous nephrolithotomy alone [
16]. Additionally, robotic stone surgery has gained attention as it reduces radiation exposure for the surgeon and assistant while achieving good treatment outcomes [
1,
17,
18]. If autonomic recognition technologies are combined, they could lead to even faster operations and better results.
This is the first study to present a CNN model for autonomic recognition using single-use fURS images. The LithoVue single-use flexible ureteroscope has a CMOS image sensor, which is inferior in image quality and sensitivity to the CCD image sensor in the Olympus URF-V [
19]. However, it has several advantages. First, it is cost-effective. Reusable digital flexible ureteroscopes cost more to purchase, repair, service, clean, and sterilize. By contrast, single-use flexible ureteroscopes have no maintenance-related costs other than purchase and storage costs [
20]. Second, they have less risk of contamination. Maintenance of reusable digital flexible ureteroscopes inevitably requires the use of high-level disinfection methods because, if not properly sterilized, they can transmit infections [
20]. Since LithoVue is a single-use flexible ureteroscope that does not have this problem, it has an advantage in terms of the risk of possible contamination. Third, single-use flexible ureteroscopes have an advantage in research using medical images. It is essential for researchers to consider variables that are changed by the different protocols or machines used in each hospital. However, LithoVue has its own workstation platform, and the monitor, light source, and image processing software are all mounted on a single mobile cart. Thus, there is no need to consider mechanical differences in research using LithoVue. Single-use flexible ureteroscopes are currently used by many hospitals because state-of-the-art devices cannot be supplied to all institutions for economic reasons. It is significant that the accuracy of the CNN model can reach 86.0% even with single-use fURS images.
In this study, transfer learning was chosen as a method of CNN model building. Transfer learning is a machine learning method that uses a pre-trained model as the starting point for a new target model. A pre-trained model is one that has already been trained on a large number of high-quality images and whose performance has already been verified. Transfer learning has the advantage of being able to create a model with good classification performance even with a few low-quality images. Therefore, it can be highly recommended to consider applying for studies of low-quality images and diseases with few cases due to low incidence rates.
This study has another important implication in that it proceeded with minimal supervised learning. We only used images after minimal pre-processing and classification of the results of the urinary calculi component for model training. In this study, there was no need for the researchers to classify the morphology of the stone or to marginate any stone or other anatomical findings. As each image pixel is data in itself and the model interprets and learns patterns from the data through convolution, it is assumed that good results can be obtained even if the intervention of the researcher is minimized.
We created localization heat maps, and the hot spots were located in the stone in 17 cases (77.3%). This result serves as significant evidence that the model focused on the stone itself rather than other structures within the image, such as renal parenchyma or guidewire, to predict the composition of the stone. However, in this study, there was no case in which the hot spot was outside of the stone in the Non-calcium group, and this seems to be because the number of cases was too small.
Data imbalances are one of the most important issues in neural network research. It is ideal to have data in equal proportions for each group in machine learning research, but this balance is difficult to achieve in the real world. In this study, the Calcium group included 175 cases, and the Non-calcium group included 32 cases. To overcome the data imbalance problem, we conducted image augmentation. Image augmentation was performed by image flipping and rotation maneuvers. In this study, the data imbalance problem was solved with a relatively simple method because the data were simply classified into two groups. However, complex classification is required to enable autonomic recognition in the future, and the issue of data imbalance should be dealt with in greater detail.
This study has several limitations. First, this study was retrospectively designed. The images used in this study were inevitably of lower quality than those taken precisely in prospectively planned studies. In addition, section images of urinary calculi were not included in this study. The composition of the surface and core can differ in urolithiasis [
21]. If section images are included in later research and images of better quality can be taken, more detailed predictions of composition can be possible, and the performance of the neural network model can be dramatically improved. Second, the FTIRS results were subject to only binary classification, which divided them into the Calcium and Non-calcium groups. There are many different components of urolithiasis, including uric acid, struvite, brushite, cysteine, and so on. In addition, the pathogenesis and etiology of urinary calculi formations differ by composition. To apply appropriate behavioral or dietary management strategies to patients in actual clinical practice, it is necessary to predict the detailed components. The finer the classification, the more complex artificial intelligence models are needed. With the development of artificial intelligence technology and further studies using high-quality data, it will be possible to solve this problem. Although this study has several limitations, it has significant meaning in the field of autonomic recognition of urolithiasis as it is the first study using single-use fURS images, and the CNN showed decent results even with a relatively small number of cases and low-quality images.