1. Introduction
Periodontal disease is a common chronic inflammatory condition that often goes unnoticed by patients, causing them to miss the optimal treatment window [
1,
2,
3]. When the alveolar bone is damaged, gingival recession and bone loss can expose the tooth roots or create sensitive teeth, compromising the stability of the teeth and potentially leading to tooth loss. If not treated promptly, the alveolar bone can also be affected, accumulating plaque [
4,
5]. Eventually, the atrophy causes periodontal pockets to expand, resulting in loose teeth. In such cases, early detection and treatment are crucial to prevent the condition from worsening and to maintain oral health.
Historically, dentists treating alveolar bone damage typically needed to perform alveolar bone surgery, whether to remove the damaged bone or to reshape it to its physiological form. This process required an assessment of alveolar bone loss, primarily relying on the judgment of the extent of erosion. The characteristics of this condition include the gradual destruction of the ALC [
6], leading to periodontal pocket formation and gingival recession. The CEJ [
7] refers to the anatomical structure where the enamel, which covers the dental crown, meets the cementum that coats the root [
8]. It is a critical reference point in clinical dentistry, as it is generally the site where gingival fibers attach to a healthy tooth [
9]. One of the primary parameters for evaluating periodontal destruction is the loss of connective tissue attachment to the tooth root surface. Consequently, the CEJ serves as a stable landmark for measuring clinical attachment loss and assessing periodontal damage [
10]. Both PAs and bitewing radiographs have served as the standard for CEJ location assessment [
11]. To assist dentists in more accurately diagnosing these conditions, this study utilized tooth recognition and segmentation techniques [
12,
13] to automate the identification of key locations such as the CEJ and the ALC. In current clinical practice, manual periapical detection faces several major challenges. Firstly, due to the low visibility of tooth gaps, the detection process is easily influenced by adjacent teeth, leading to erroneous judgments. Secondly, the varying angles of each tooth’s apex further increase the difficulty of detection, affecting the accuracy and reliability of the assessments. Due to the inconsistent angles of PA images compared to bitewing or panoramic radiographs, it is challenging to annotate these images conveniently [
14]. Moreover, the variability in the quality of X-ray images further complicates the identification of the CEJ and ALC, necessitating the preprocessing of PA images [
15] to enhance their quality.
As technology advances, the application of artificial intelligence (AI) has become increasingly widespread across various fields. For instance, endoscopic examinations [
16], pancreatic cancer treatment [
17], and lung nodules detection [
18]. Significant progress has been made in the application of artificial intelligence in dentistry, surpassing the traditional methods that solely relied on visual inspection and professional expertise. Dentists can utilize image masking techniques to enhance the contrast of X-rays [
19] for better determination of furcation [
20]. The use of convolutional neural network (CNN) models for dental detection is increasing, such as for detecting maxillary sinusitis [
21], identifying caries and restorations [
22,
23], accurately detecting implant positions, and assessing the damage caused by peri-implantitis [
24] or dual-supervised network (DSN) models for tooth recognition and boundary regression [
25]. In addition to the techniques, many studies have proposed improvements based on object detection models for medical image analysis [
26], such as FLM-RCNN [
27], the Levenberg–Marquardt backpropagation training algorithm [
28], YOLO-DENTAL [
29], and using feedforward neural networks for classifying lesions [
30]. Various techniques have been applied to DPR images for detecting apical lesions [
31] and inferior alveolar nerve injuries [
32].
From the above, AI technology can effectively reduce the burden of dentists during consultations. Thus, this study utilizes both YOLO (You Only Look Once) and Mask R-CNN training models, combined with the preprocessing of PA images to enhance the accuracy of YOLO in detecting individual teeth. The Mask R-CNN model is applied to mask these images, specifically to identify and extract the CEJ and ALC positions.
2. Materials and Methods
This section will describe the methods and is divided into the following parts.
2.1. Study Design
To assist dentists in more accurately diagnosing these conditions, this study utilized tooth detection and instance segmentation techniques to identify the location of the CEJ and the ALC automatically. Initially, PA images are preprocessed to enhance image quality, making the contours of teeth and alveolar bone clearer. Then, the YOLOv8 model is used to predict the position of individual teeth and, based on the localization results, the individual teeth are segmented from the PA images. Data augmentation techniques such as rotation angles are employed to increase the sample size of the database and prevent model overfitting. The segmented tooth images are classified, and the YOLOv8 classification model [
33] excludes teeth that cannot be assessed, such as those with partially obscured CEJ levels or with implants and crowns. Subsequently, using the Detectron2 framework with Mask R-CNN, masks for the teeth, bone, and crown are extracted. These masks are used to determine the ALC level and CEJ level. The overall flowchart of this study is illustrated in
Figure 1. The contributions of this study are as follows:
YOLOv8 achieved a sensitivity of 94.3% in extracting single teeth from original images. With the developed CLAHE algorithm, the sensitivity improved to 97.5%.
Mask R-CNN was utilized to extract three types of masks, with the DSC and Jaccard index both exceeding 90%. Additionally, the image augmentation method developed in this study showed an improvement of 1~3%.
A localization algorithm is proposed for CEJ and ALC; the RMSE is lower than 0.09, providing visualization techniques to aid dentists in diagnosis.
2.2. Data Collection
The database used in this study is sourced from the Chang Gung Memorial Hospital in Taoyuan, Taiwan. It was compiled by five dental practitioners, each with over five years of experience. The study received approval from the Institutional Review Board (IRB: 202301730B0), ensuring that ethical standards were met in the research process. To observe lesions near the alveolar bone, the use of PA imaging is specifically justified. Due to the larger scale of PA images, it is difficult to clearly observe localized lesions, making PA imaging a preferred choice in clinical practice. The smaller size of the PA image allows for more detailed local observation, thereby enhancing diagnostic accuracy. The selection of image sizes, such as 825×1200 and 1200×825, is based on providing more suitable proportions when sectioning teeth.
2.3. Statistical Analysis
The CEJ is an anatomical landmark on the tooth that separates the crown from the root structure, where the formation of enamel (the outermost layer of the crown) stops and the formation of cementum (the outermost layer of the root) begins [
34]. This study references a technique [
35] for lesion marking, serving as a basis for researchers. Dentists’ manually annotated results are exported to an Excel file, containing key point information marked on the images. These data serve as a baseline for comparison with the automated annotation results. The automated annotation system can detect the coordinates of the CEJ and ALC and compare them with the coordinates marked by dentists. To evaluate the accuracy of the automated annotation system, this study calculates the directional offset and further computes the root mean square error (RMSE). The RMSE quantifies the difference between the predicted and actual values, providing an assessment of the accuracy of the automated annotation system relative to the manual annotation system. This comparison helps understand the performance of the automated annotation method and its feasibility in practical applications.
2.4. Tooth Segmentation
The steps involved in tooth segmentation in this study are illustrated in
Figure 2. First, PA images are preprocessed to enhance model detection accuracy and facilitate annotation. These preprocessing steps include image enhancement, contrast adjustment, and noise reduction, ensuring that the contours of the teeth and alveolar bone are clearly visible. Next, the preprocessed PA images are annotated, specifically marking the locations of individual teeth. These annotated image data are used to train the YOLOv8 model, enabling it to accurately detect the positions of individual teeth. Once training is completed, the model is used to predict and automatically segment individual teeth from new images using the segment algorithm.
2.4.1. PA Image Preprocessing
The quality of the original image can affect the physician’s diagnosis. To clarify the contours of the teeth, this study preprocesses PA images to improve the detection accuracy of YOLOv8 and provide more precise annotations. First, the original image,
Figure 3a, undergoes denoising using a median filter to remove extraneous noise, resulting in
Figure 3b. This allows the noise to blend into the surrounding pixels, which is crucial because any remaining noise might be amplified during the subsequent CLAHE processing, thereby degrading image quality. Next, CLAHE is applied for preprocessing. This step enhances the contours of various objects in the image, such as the teeth, alveolar bone, and roots, making them more distinct, as shown in
Figure 3c. A notable advantage of using CLAHE is that it enhances contrast based on local regions of the image rather than the global contrast, preventing issues where some parts of the image become too bright while others become too dark. These preprocessing steps are highly beneficial for the model training in this study and significantly improve accuracy.
2.4.2. PA Image Annotation and Dataset Augmentation
The Roboflow annotate tool was used to perform polygonal annotation on the processed teeth, as shown in
Figure 4, and the annotation results were exported in TXT format. This step aims to construct the labeled dataset required for training the model. Data augmentations were employed to enhance the training effectiveness, including flipping the images and rotating 90° clockwise and counterclockwise, the YOLOv8 dataset detailed is shown in
Table 1. YOLOv8, like its predecessors, is designed for real-time object detection. It includes innovations in the backbone, neck, and head of the network to enhance feature extraction and object detection accuracy. The model is trained using a loss function that combines classification loss, localization loss (bounding box regression), and confidence loss. Hyperparameters such as learning rate, batch size, number of epochs, and others need to be set. During the training phase, the performance of the model is evaluated at each epoch using the validation set. Based on the validation results, adjustments can be made to the model architecture, hyperparameters, or training process to improve performance. These data augmentation methods increase the diversity of the dataset, helping the model to capture the features of the teeth precisely. By integrating various preprocessing and data augmentation methods, the accuracy and generalization ability of the model in the tooth segmentation task can be significantly improved.
2.4.3. YOLOv8 Detection
YOLO is a deep learning technique specifically designed for object detection, it uses a single forward propagation step to directly predict the positions and categories of all objects in an image. This allows YOLO to perform object detection efficiently in real-time while maintaining a high level of accuracy. This study uses the YOLOv8 model to predict the teeth’s positions and then segments individual teeth based on the model’s localization. Additionally, it compares the performance of YOLOv5, YOLOv7, and YOLOv8 in handling the tooth segmentation task. Initially, data preprocessing and augmentation are conducted to ensure that the training data are consistent and diverse, thereby enhancing the model’s generalization ability. The processed data are divided into training, validation, and test sets, as shown in
Table 1 to facilitate model training and evaluation. Subsequently, the prepared datasets are input into the YOLOv5, YOLOv7, and YOLOv8 models for training, allowing the observation of their learning effectiveness and convergence rates; the indicators are calculated as follows, and the formula is shown in Equations (1)–(3).
Table 2 provides a detailed overview of the hardware and software configurations used in the system. The hardware components include an AMD Ryzen 7 3700X CPU, an NVIDIA GeForce RTX 3070 GPU with 8 GB of memory, and 32 GB of DRAM. The software stack includes Python version 3.11.9, PyTorch version 2.4.0, and CUDA version 12.1.
2.5. Mask R-CNN
The individual single-tooth image extracted from the previous steps is input into the Mask R-CNN model, as shown in the workflow diagram in
Figure 5. In this process, all teeth in the image are annotated, regardless of whether they are intact. This step aims to generate three types of masks, specifically for the tooth, bone, and crown, serving as the intermediate masked images for subsequent steps. These masked images will be used for further analysis and diagnosis, aiding in the precise localization and identification of dental structure, and enhancing the accuracy and reliability of the diagnosis.
2.5.1. Single Tooth Image Augmentation
To improve accuracy and training effectiveness, this study applies vertical flipping to the single-tooth images obtained after CLAHE enhancement. This augmentation increases the number of single-tooth images, preventing overfitting of the Mask R-CNN model and providing more training data to enhance accuracy. Consequently, the training and validation datasets for this study have been doubled in size, as shown in
Table 3, ensuring a more robust and comprehensive dataset for model training.
2.5.2. Single-Tooth Annotation Mask
To locate the ALC level and assist dentists in preliminarily identifying critical treatment areas, this study annotates the tooth bone and trains the Mask R-CNN model to extract the Bone Mask. For locating the CEJ level, the study annotates the tooth crown to extract the Crown Mask. The methods and processes for these annotations are illustrated in
Figure 6. These steps are designed to accurately pinpoint the key anatomical structures of the teeth, enhancing the precision of dental diagnostics.
2.5.3. Mask R-CNN
This study utilized the Detectron2 framework for mask extraction from dental images, opting for the Mask R-CNN model to achieve precise pixel-level segmentation. Mask R-CNN extends Faster R-CNN’s object detection results to perform pixel-level classification, with its main components including the backbone, head, and ROIAlign. ResNet-50 is used as the backbone to extract multi-level features from the images and generate candidate regions. The head component employs a Feature Pyramid Network (FPN) for classifying candidate regions and bounding box regression, producing the final detection results and pixel-level masks. Additionally, the ROIAlign component in Mask R-CNN precisely aligns the features of each candidate region on the feature map using bilinear interpolation, overcoming the quantization errors that may arise from traditional ROI pooling methods. This study employed Mask R-CNN to extract three distinct categories of masks required for the subsequent stages of the process. Initially, individual teeth were segmented, followed by the use of Mask R-CNN for mask extraction. This strategy narrows the detection scope of the model, which not only reduces the training time but also enhances the overall training process, leading to superior mask extraction results. During this phase, the data were divided approximately in a 7:3 ratio, with 140 images used for training and 54 for validation. Additionally, the dataset was augmented by vertically flipping the tooth images, effectively increasing the dataset size to 280 images for training and 108 for validation.
2.6. ALC Level Localization
The largest area of the Tooth Mask is retained and overlaid with the Bone Mask to form a composite mask. Subsequently, the custom localization algorithm developed in this study is applied to identify the ALC level for each symptomatic tooth in the PA image. This approach allows for the accurate identification of critical symptomatic regions, providing dentists with precise diagnostic information.
2.6.1. Retained the Largest Mask
Three different mask categories are obtained from the Mask R-CNN prediction results: Tooth Mask, Bone Mask, and Crown Mask. After acquiring these masks, only the fully segmented teeth are analyzed. For the incomplete teeth within the Tooth Mask, the neighboring Tooth Mask needs to be addressed. Since YOLOv8 was used earlier to extract individual teeth, each Tooth Mask contains one complete tooth, covering most of the area, allowing the removal of the masks of any incomplete teeth, as shown in
Figure 7a,b.
Instance segmentation not only requires detecting the object’s class and location but also involves pixel-level segmentation for each object, each object is segmented into a unique region, even if they belong to the same class. The study identifies all pixel values within the Tooth Mask and saves the masks predicted by Mask R-CNN, representing the pixels of the same object with the same value. The number of different pixel values is then calculated and the pixel value with the largest area is recorded. Other objects with different pixel values are removed, as shown in
Figure 7c. This method ensures that only the complete Tooth Mask is used for analysis, thereby improving diagnostic accuracy and efficiency.
2.6.2. Overlay and ALC Localization
To locate the ALC level, the Bone Mask and Tooth Mask predicted by Mask R-CNN are overlaid. By analyzing the overlapping regions of these two masks, the position of the ALC level can be determined. The overlay process is illustrated in
Figure 8. Since the teeth and bone might share the same pixel values, the pixel value differences cannot be used for localization. Therefore, this study assigns new pixel values to the Tooth Mask, as shown in
Figure 8b, and uses these new pixel value differences to locate the ALC level. Next, a 5 × 5 kernel is iteratively applied across the entire image. If the kernel in a given iteration contains the pixel values from
Figure 8b and the overlapping pixel values from
Figure 8c, the kernel at that iteration is identified as the ALC level, as shown in
Figure 8e.
2.7. CEJ Level Localization
The largest area of the Tooth Mask is retained, and the Crown Mask is subjected to dilation. The processed Tooth Mask and Crown Mask are then overlaid to form a composite mask. The developed localization algorithm is used to identify the CEJ position for each symptomatic tooth in the PA image. This method ensures precise localization by accurately identifying and processing key areas, improving diagnostic accuracy.
2.7.1. Crown Mask Dilation
CEJ refers to the junction between the crown and root. It can be analyzed by overlaying the Tooth Mask and Crown Mask. Since it is not feasible to predict a Crown Mask that perfectly aligns with the crown position in the Tooth Mask, this leads to discrepancies in pixel locations and reduces the accuracy of localization. Hence, the Crown Mask can be dilated using Equation (4), ensuring that the crown completely envelops the crown portion of the Tooth Mask. This leaves only the necessary pixel discrepancy locations.
2.7.2. Overlay and CEJ Localization
Due to the gingiva attached above the CEJ and its position at the junction between the crown and the root, this junction often becomes an error-prone area in symptom diagnosis. To help dentists make more precise and faster judgments or treatments when extracting symptoms, this study proposes a new method to identify the CEJ level. This technique involves overlaying the Tooth Mask and the dilated Crown Mask, allowing it to encompass the crown portion of the Tooth Mask. This method significantly enhances the accuracy of CEJ localization, as demonstrated in
Figure 9d.
Figure 10 compares the effectiveness of the CEJ level localization algorithm with and without the dilation process in this study. Initially, the analysis of the un-dilated Crown Mask revealed that the top of the crown often had many overlapping masks, which hindered the accurate localization of the CEJ level. Therefore, after applying the dilation process to the Crown Mask, the localization results improved significantly, ensuring that the mask intersection occurred only at the junction between the crown and the root. This approach markedly enhances the accuracy of CEJ level localization, thereby better aiding in the diagnosis and treatment of symptoms.
4. Discussion
This study evaluated the efficacy of using advanced image processing and deep learning techniques for periodontal diagnosis. The primary objective was to accurately detect and segment individual teeth and key anatomical landmarks, such as the CEJ and ALC, using PA imaging. Initially, the PA images underwent preprocessing with CLAHE to enhance image quality shown in
Figure 3. This preprocessing significantly clarified the contours of teeth and alveolar bone, facilitating more precise annotations and model training. This study utilized the YOLOv8 and Mask R-CNN models to predict and segment dental images and applied image enhancement techniques to improve the accuracy of these models. For tooth detection, the YOLOv8 model was employed. The model achieved a sensitivity of 94.3% on the original images, which improved to 97.5% after applying the CLAHE enhancement. YOLOv8 demonstrated superior performance compared to its predecessors, YOLOv5 and YOLOv7, particularly in terms of precision, recall, mAP, and detection accuracy. The enhanced YOLOv8 model reached an impressive accuracy of 97.01% on the test dataset, as shown in
Table 13. The study also conducted a detailed comparison with the methods in [
36,
37]. The approach in [
36] uses a matrix to calculate inter-proximal space for tooth segmentation, relying heavily on matrix operations to locate these spaces, which works well when teeth are closely aligned. However, its accuracy may diminish with irregular or overlapping teeth. Meanwhile, the algorithm proposed by Nomir and Abdel-Mottaleb in [
37] achieves tooth segmentation by separating teeth from the background using horizontal and vertical projections. This method performs well with images where teeth and background contrast sharply, but its accuracy can decrease in complex or low-contrast backgrounds.
According to the training results shown in
Table 9, image enhancement significantly improved the performance of the Mask R-CNN model. For instance, for the Tooth Mask, the AP increased from 66.73% to 69.65%, AP50 from 88.32% to 89.54%, and AP75 from 74.65% to 81.66% after image augmentation. For the Bone Mask, the AP rose from 73.32% to 76.66%, AP50 from 98.15% to 99.86%, and AP75 from 90.17% to 92.26%. Regarding the Crown Mask, the AP increased from 79.14% to 81.55%, AP50 remained at 100%, and AP75 slightly decreased from 98.02% to 96.29%. These results indicate that the training outcomes for all three mask categories improved after image enhancement, demonstrating the significant effectiveness of image enhancement techniques in increasing the detection accuracy of the Mask R-CNN model. Additionally,
Table 11 shows the accuracy comparison of the Mask R-CNN model for the three mask categories. The results indicate that image enhancement had the most significant improvement in the accuracy of the Tooth Mask, increasing from 92.63% to 93.48%. For the Bone Mask, the accuracy slightly improved from 95.50% to 96.95%. However, the accuracy of the Crown Mask slightly decreased from 96.79% to 96.21% after image enhancement. This suggests that while image enhancement generally positively impacts the model’s accuracy, its effects can vary across different mask categories. In contrast, the Unet model performed lower in both the DSC and Jaccard index across all classifications [
6]; the comparison is shown in
Table 14. This highlights the significant advantage of the Mask R-CNN model in handling CEJ line classification. Another significant contribution of this study is the new method proposed for locating the CEJ and ALC levels, which can be seen in
Table 12. Throughout Mask R-CNN and image processing to evaluate ALC and CEJ position, the last RMSE is 0.0209 at CEJ (right). These localization techniques assist dentists in faster and more accurate symptom evaluation, significantly enhancing diagnostic efficiency and precision.
Specifically, the developed image enhancement and mask segmentation techniques in this study significantly improved the classification accuracy of the CEJ line, providing robust support for the early diagnosis and treatment of periodontitis. This study has several limitations. First, variations in image quality could impact the model’s accuracy, especially in clinical applications where low-quality or noisy images are more common. Additionally, the reliance on manual annotations introduces potential biases; even experienced dentists may have inconsistencies in their annotations, which can influence the training outcomes of the model. While YOLOv8 and Mask R-CNN performed well in this study, these models may face challenges in handling more complex cases, such as overlapping or missing teeth and severe periodontal diseases. The practical implementation of these AI technologies in clinical settings also necessitates seamless integration into existing diagnostic workflows and ensuring that practitioners can effectively utilize these tools. Future research will focus on exploring various periodontal conditions, including calculating bone loss to assist clinicians in quickly assessing patients’ conditions. The study will emphasize optimizing algorithms for the localization of the CEJ and the ALC, aiming to enhance the system’s efficiency and accuracy. Additionally, different object detection and semantic segmentation models will be compared to identify the most suitable model for PA images, which will be integrated into future systems. The research will also concentrate on utilizing algorithms to aid clinicians in calculating bone loss and addressing other periodontal conditions.