Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models
Abstract
:1. Introduction
2. Material and Methods
2.1. Dataset Description
2.2. Relevant Literature
2.3. Data Preprocessing
2.4. Deep Learning Model Used
2.4.1. Convolutional Neural Networks (CNNs)
2.4.2. VGGNet (Visual Geometry Group Network)
2.4.3. ResNet (Residual Networks)
2.4.4. MobileNet
2.4.5. DenseNet (Densely Connected Convolutional Networks)
2.4.6. Vision Transformer (ViT)
2.4.7. Moondream2
2.5. The Overall Workflow
2.6. Evaluation Metrics
2.6.1. Classification Evaluation Metrics
2.6.2. Regression Evaluation Metrics
3. Results
3.1. Sex Classification (Male)
3.2. Sex Classification (Female)
3.3. Overall Sex Classification Model Performance
3.4. Overall Age Regression Model Performance
3.5. Receiver Operating Characteristics of the Models
3.6. Confusion Matrices of Sex Models
3.7. Inference Time
4. Discussion
Limitations and Future Recommendations
- Addressing sex classification prejudice: Future research should focus on finding ways to reduce the sex prejudice found in this study. This could include approaches such as class balance through data augmentation, using class weights in model training, or investigating domain-specific architectures that perform better on sex categorisation tasks. Additional strategies for equalizing sex representation in the dataset could increase model fairness and accuracy.
- Expanding and diversifying the dataset: To improve regression task performance, future studies should use a larger, more diversified dataset with a wider variety of age labels. A larger dataset would provide the model with more generalization capacity, lowering the danger of overfitting and enhancing predictions across age groups. Furthermore, collecting data from several demographic groups may result in greater representation and reduced bias, especially for age-related predictions.
- Exploring the potential of larger VLMs: Given the potential capabilities of larger VLMs, future studies could investigate their use in medical imaging applications, particularly sex and age prediction. Although these models necessitate significant computational resources, developments in model optimisation, transfer learning, and distributed computing may make them more viable. Researchers could also examine hybrid models that combine the strengths of both smaller and larger VLMs, potentially improving prediction accuracy while maintaining computational economy.
5. Conclusions
- Sex classification: Convolutional neural networks (CNNs) and DenseNet models demonstrated strong performance, achieving an accuracy of approximately 85%.
- Age estimation: DenseNet outperformed other models, achieving the highest performance with the lowest mean squared error in estimating age from panoramic radiographs.
- Inference time analysis: MobileNet and CNNs emerged as the fastest models, making them suitable for real-time applications. DenseNet models, while slightly slower, offered an optimal balance between computational efficiency and accuracy.
- Vision–language models (VLMs) and multimodal systems: These models require further research and development to enhance their competitiveness and reliability for clinical applications.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bassed, R.B.; Briggs, C.; Drummer, O.H. Age estimation using CT imaging of the third molar tooth, the medial clavicular epiphysis, and the spheno-occipital synchondrosis: A multifactorial approach. Forensic. Sci. Int. 2011, 212, 273.e1. [Google Scholar] [CrossRef] [PubMed]
- Čular, L.; Tomaić, M.; Subašić, M.; Šarić, T.; Sajković, V.; Vodanović, M. Dental age estimation from panoramic X-ray images using statistical models. In Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis, Ljubljana, Slovenia, 18–20 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 25–30. [Google Scholar]
- Mathew, N.S.; Chatra, L.; Shenoy, P.; Veena, K.M.; Prabhu, R.V.; Sujatha, B.K. Gender determination in panoramic radiographs, utilizing mandibular ramus parameters: A cross-sectional study. J. Dent. Res. Rev. 2017, 4, 32–35. [Google Scholar] [CrossRef]
- Ratson, T.; Dagon, N.; Aderet, N.; Dolev, E.; Laviv, A.; Davidovitch, M.; Blumer, S. Assessing Children’s Dental Age with Panoramic Radiographs. Children 2022, 9, 1877. [Google Scholar] [CrossRef]
- Farook, T.H.; Rashid, F.; Ahmed, S.; Dudley, J. Clinical machine learning in parafunctional and altered functional occlusion: A systematic review. J. Prosthet. Dent. 2023, in press. [CrossRef]
- Silva, G.; Oliveira, L.; Pithon, M. Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives. Expert. Syst. Appl. 2018, 107, 15–31. [Google Scholar] [CrossRef]
- Jader, G.; Fontineli, J.; Ruiz, M.; Abdalla, K.; Pithon, M.; Oliveira, L. Deep instance segmentation of teeth in panoramic X-ray images. In Proceedings of the 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Paraná, Brazil, 29 October–1 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 400–407. [Google Scholar]
- Verma, V. Introducing Moondream2: A Tiny Vision-Language Model. 2024. Available online: https://www.analyticsvidhya.com/blog/2024/03/introducing-moondream2-a-tiny-vision-language-model/ (accessed on 24 November 2024).
- Ikechukwu, A.V.; Murali, S.; Deepu, R.; Shivamurthy, R.C. ResNet-50 vs VGG-19 vs training from scratch: A comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images. Glob. Transit. Proc. 2021, 2, 375–381. [Google Scholar] [CrossRef]
- Khan, Z.; Khan, F.G.; Khan, A.; Rehman, Z.U.; Shah, S.; Qummar, S.; Ali, F.; Pack, S. Diabetic retinopathy detection using VGG-NIN a deep learning architecture. IEEE Access 2021, 9, 61408–61416. [Google Scholar] [CrossRef]
- Liu, D.; Liu, Y.; Dong, L. G-ResNet: Improved ResNet for brain tumor classification. In Proceedings of the Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, 12–15 December 2019; Proceedings, Part i 26. Springer: Berlin/Heidelberg, Germany, 2019; pp. 535–545. [Google Scholar]
- Riasatian, A.; Babaie, M.; Maleki, D.; Kalra, S.; Valipour, M.; Hemati, S.; Zaveri, M.; Safarpoor, A.; Shafiei, S.; Afshari, M.; et al. Fine-tuning and training of densenet for histopathology image representation using tcga diagnostic slides. Med. Image Anal. 2021, 70, 102032. [Google Scholar] [CrossRef]
- Aboshi, H.; Takahashi, T.; Komuro, T. Age estimation using microfocus X-ray computed tomography of lower premolars. Forensic. Sci. Int. 2010, 200, 35–40. [Google Scholar] [CrossRef] [PubMed]
- Asif, M.K.; Nambiar, P.; Mani, S.A.; Ibrahim, N.B.; Khan, I.M.; Sukumaran, P. Dental age estimation employing CBCT scans enhanced with Mimics software: Comparison of two different approaches using pulp/tooth volumetric analysis. J. Forensic Leg. Med. 2018, 54, 53–61. [Google Scholar] [CrossRef]
- Asif, M.K.; Nambiar, P.; Mani, S.A.; Ibrahim, N.B.; Khan, I.M.; Lokman, N.B. Dental age estimation in Malaysian adults based on volumetric analysis of pulp/tooth ratio using CBCT data. Leg. Med. 2019, 36, 50–58. [Google Scholar] [CrossRef]
- Arian, M.S.H.; Rakib, M.T.A.; Ali, S.; Ahmed, S.; Farook, T.H.; Mohammed, N.; Dudley, J. Pseudo labelling workflow, margin losses, hard triplet mining, and PENViT backbone for explainable age and biological gender estimation using dental panoramic radiographs. SN Appl. Sci. 2023, 5, 279. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:14091556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:170404861. [Google Scholar]
- Li, Y.; Huang, Z.; Dong, X.; Liang, W.; Xue, H.; Zhang, L.; Zhang, Y.; Deng, Z. Forensic age estimation for pelvic X-ray images using deep learning. Eur. Radiol. 2019, 29, 2322–2329. [Google Scholar] [CrossRef] [PubMed]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Oktay, A.B. Tooth detection with convolutional neural networks. In Proceedings of the 2017 Medical Technologies National Congress (TIPTEKNO), Antalya, Türkiye, 27–29 October 2016; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar]
- Kuo, Y.-F.; Lin, S.-Y.; Wu, C.H.; Chen, S.-L.; Lin, T.-L.; Lin, N.-H.; Mai, C.-H.; Villaverde, J.F. A convolutional neural network approach for dental panoramic radiographs classification. J. Med. Imaging Health Inform. 2017, 7, 1693–1704. [Google Scholar] [CrossRef]
- Farhadian, M.; Salemi, F.; Saati, S.; Nafisi, N. Dental age estimation using the pulp-to-tooth ratio in canines by neural networks. Imaging Sci. Dent. 2019, 49, 19–26. [Google Scholar] [CrossRef] [PubMed]
- Sironi, E.; Taroni, F.; Baldinotti, C.; Nardi, C.; Norelli, G.-A.; Gallidabino, M.; Pinchi, V. Age estimation by assessment of pulp chamber volume: A Bayesian network for the evaluation of dental evidence. Int. J. Legal Med. 2018, 132, 1125–1138. [Google Scholar] [CrossRef]
- Lu, J.; Liong, V.E.; Zhou, J. Cost-sensitive local binary feature learning for facial age estimation. IEEE Trans. Image Process. 2015, 24, 5356–5368. [Google Scholar] [CrossRef]
- Milošević, D.; Vodanović, M.; Galić, I.; Subašić, M. Automated estimation of chronological age from panoramic dental X-ray images using deep learning. Expert. Syst. Appl. 2022, 189, 116038. [Google Scholar] [CrossRef]
- De Back, W.; Seurig, S.; Wagner, S.; Marré, B.; Roeder, I.; Scherf, N. Forensic age estimation with Bayesian convolutional neural networks based on panoramic dental X-ray imaging. In Proceedings of the Machine Learning Research 2019, MIDL 2019, London, UK, 8–10 July 2019. [Google Scholar]
- Zhang, G.; Kurita, T. Age Estimation from the Age Period by Using Triplet Network. In Proceedings of the Frontiers of Computer Vision: 27th International Workshop, IW-FCV 2021, Daegu, Republic of Korea, 22–23 February 2021; Revised Selected Papers 27. Springer: Berlin/Heidelberg, Germany, 2021; pp. 81–92. [Google Scholar]
- Tuzoff, D.V.; Tuzova, L.N.; Bornstein, M.M.; Krasnov, A.S.; Kharchenko, M.A.; Nikolenko, S.I.; Sveshnikov, M.M.; Bednenko, G.B. Tooth detection and numbering in panoramic radiographs using convolutional neural networks. Dentomaxillofacial Radiol. 2019, 48, 20180051. [Google Scholar] [CrossRef] [PubMed]
- Fan, F.; Ke, W.; Dai, X.; Shi, L.; Liu, Y.; Lin, Y.; Cheng, Z.; Zhang, Y.; Chen, H.; Deng, Z. Semi-supervised automatic dental age and sex estimation using a hybrid transformer model. Int. J. Legal Med. 2023, 137, 721–731. [Google Scholar] [CrossRef] [PubMed]
- Chu, P.; Bo, C.; Liang, X.; Yang, J.; Megalooikonomou, V.; Yang, F.; Huang, B.; Li, X.; Ling, H. Using octuplet siamese network for osteoporosis analysis on dental panoramic radiographs. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2579–2582. [Google Scholar]
- Mualla, N.; Houssein, E.H.; Hassan, M.R. Dental Age Estimation Based on X-ray Images. Comput. Mater. Contin. 2020, 62, 591–605. [Google Scholar] [CrossRef]
- Vila-Blanco, N.; Carreira, M.J.; Varas-Quintana, P.; Balsa-Castro, C.; Tomas, I. Deep neural networks for chronological age estimation from OPG images. IEEE Trans. Med. Imaging 2020, 39, 2374–2384. [Google Scholar] [CrossRef]
- Atas, I. Human gender prediction based on deep transfer learning from panoramic radiograph images. arXiv 2022, arXiv:220509850. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:201011929. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:230213971. [Google Scholar]
- Salahin, S.M.S.; Ullaa, M.D.S.; Ahmed, S.; Mohammed, N.; Farook, T.H.; Dudley, J. One-Stage Methods of Computer Vision Object Detection to Classify Carious Lesions from Smartphone Imaging. Oral 2023, 3, 176–190. [Google Scholar] [CrossRef]
- Shreffler, J.; Huecker, M.R. Diagnostic Testing Accuracy: Sensitivity, Specificity, Predictive Values and Likelihood Ratios; StatPearls Publishing: Treasure Island, FL, USA, 2020. [Google Scholar]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Razzaki, S.; Baker, A.; Perov, Y.; Middleton, K.; Baxter, J.; Mullarkey, D.; Sangar, D.; Taliercio, M.; Butt, M.; Majeed, A.; et al. A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis. arXiv 2018, arXiv:180610698. [Google Scholar]
- Goodman, R.S.; Patrinely, J.R.; Stone, C.A.; Zimmerman, E.; Donald, R.R.; Chang, S.S.; Berkowitz, S.T.; Finn, A.P.; Jahangir, E.; Scoville, E.A.; et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw. Open 2023, 6, e2336483. [Google Scholar] [CrossRef]
- Farook, T.H.; Dudley, J. Automation and deep (machine) learning in temporomandibular joint disorder radiomics. A Syst. Rev. J. Oral. Rehabil. 2023, 50, 501–521. [Google Scholar] [CrossRef] [PubMed]
- Panetta, K.; Rajendran, R.; Ramesh, A.; Rao, S.P.; Agaian, S. Tufts dental database: A multimodal panoramic x-ray dataset for benchmarking diagnostic systems. IEEE J. Biomed. Health Inform. 2021, 26, 1650–1659. [Google Scholar] [CrossRef] [PubMed]
- Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
Model | Precision | Recall | F1 Score |
---|---|---|---|
CNN | 0.85 | 0.85 | 0.85 |
VGG16 | 0.83 | 0.83 | 0.83 |
VGG19 | 0.74 | 0.9 | 0.81 |
ResNet50 | 0.75 | 0.37 | 0.49 |
ResNet101 | 0.77 | 0.49 | 0.6 |
ResNet152 | 0.71 | 0.88 | 0.78 |
MobileNet | 0.7 | 0.93 | 0.8 |
DenseNet121 | 0.77 | 0.9 | 0.83 |
DenseNet169 | 0.75 | 0.88 | 0.81 |
Vision Transformer | 0.69 | 0.83 | 0.76 |
Moondream2 | 0.51 | 0.88 | 0.64 |
Model | Precision | Recall | F1 Score |
---|---|---|---|
CNN | 0.84 | 0.84 | 0.84 |
VGG16 | 0.82 | 0.82 | 0.82 |
VGG19 | 0.86 | 0.66 | 0.75 |
ResNet50 | 0.56 | 0.87 | 0.68 |
ResNet101 | 0.6 | 0.84 | 0.7 |
ResNet152 | 0.82 | 0.61 | 0.70 |
MobileNet | 0.88 | 0.58 | 0.7 |
DenseNet121 | 0.87 | 0.71 | 0.78 |
DenseNet169 | 0.84 | 0.68 | 0.75 |
Vision Transformer | 0.77 | 0.61 | 0.68 |
Moondream2 | 0.38 | 0.08 | 0.13 |
Model | Accuracy | AUC |
---|---|---|
CNN | 0.85 | 0.85 |
VGG16 | 0.82 | 0.84 |
VGG19 | 0.78 | 0.82 |
ResNet50 | 0.61 | 0.75 |
ResNet101 | 0.66 | 0.74 |
ResNet152 | 0.75 | 0.82 |
MobileNet | 0.76 | 0.83 |
DenseNet121 | 0.81 | 0.84 |
DenseNet169 | 0.78 | 0.85 |
Vision Transformer | 0.72 | 0.77 |
Moondream2 | 0.49 | 0.48 |
Model | MSE | MAE | MAPE | RMSE | R2 |
---|---|---|---|---|---|
CNN | 142.88 | 9.3 | 32.37% | 11.95 | 0.29 |
VGG16 | 148.19 | 9.96 | 36.52% | 12.17 | 0.26 |
VGG19 | 142.13 | 9.57 | 33.83% | 11.92 | 0.29 |
ResNet50 | 188.79 | 11.59 | 44.76% | 13.74 | 0.06 |
ResNet101 | 170.68 | 10.62 | 38.16% | 13.06 | 0.15 |
ResNet152 | 166.36 | 10.34 | 39% | 12.9 | 0.17 |
MobileNet | 95.46 | 7.78 | 26.07% | 9.77 | 0.52 |
DenseNet121 | 97.2 | 7.95 | 27.31% | 9.86 | 0.52 |
DenseNet169 | 85.83 | 7.07 | 22.98% | 9.26 | 0.57 |
Vision Transformer | 159.98 | 10.24 | 37.38% | 12.65 | 0.2 |
Models | Male | Female | ||
---|---|---|---|---|
Predicted | Actual | Predicted | Actual | |
CNN | 35 | 41 | 32 | 38 |
VGG16 | 34 | 41 | 31 | 38 |
VGG19 | 37 | 41 | 25 | 38 |
ResNet50 | 15 | 41 | 33 | 38 |
ResNet101 | 20 | 41 | 32 | 38 |
ResNet152 | 36 | 41 | 23 | 38 |
MobileNet | 38 | 41 | 22 | 38 |
DenseNet121 | 37 | 41 | 27 | 38 |
DenseNet169 | 36 | 41 | 26 | 38 |
Vision Transformer | 34 | 41 | 23 | 38 |
Moondream2 | 36 | 41 | 3 | 38 |
Model | Time Taken in Milliseconds |
---|---|
CNN | 12.658 |
VGG16 | 88.608 |
VGG19 | 113.924 |
ResNet50 | 37.975 |
ResNet101 | 50.633 |
ResNet152 | 101.266 |
MobileNet | 12.658 |
DenseNet121 | 25.316 |
DenseNet169 | 37.975 |
Vision Transformer | 164.557 |
Moondream2 | 4481.013 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alam, S.S.; Rashid, N.; Faiza, T.A.; Ahmed, S.; Hassan, R.A.; Dudley, J.; Farook, T.H. Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models. Oral 2025, 5, 3. https://doi.org/10.3390/oral5010003
Alam SS, Rashid N, Faiza TA, Ahmed S, Hassan RA, Dudley J, Farook TH. Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models. Oral. 2025; 5(1):3. https://doi.org/10.3390/oral5010003
Chicago/Turabian StyleAlam, Salem Shamsul, Nabila Rashid, Tasfia Azrin Faiza, Saif Ahmed, Rifat Ahmed Hassan, James Dudley, and Taseef Hasan Farook. 2025. "Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models" Oral 5, no. 1: 3. https://doi.org/10.3390/oral5010003
APA StyleAlam, S. S., Rashid, N., Faiza, T. A., Ahmed, S., Hassan, R. A., Dudley, J., & Farook, T. H. (2025). Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models. Oral, 5(1), 3. https://doi.org/10.3390/oral5010003