Next Article in Journal
Topical Pentravan® Based Compositions with Naproxen and Its Proline Ester Derivative—A Comparative Study of Physical Properties and Permeation of Naproxen Through the Human Skin
Previous Article in Journal
A Novel Hybrid Methodology Based on Transfer Learning, Machine Learning, and ReliefF for Chickpea Seed Variety Classification
Previous Article in Special Issue
Enhanced Conformer-Based Speech Recognition via Model Fusion and Adaptive Decoding with Dynamic Rescoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Age Prediction from Korean Speech Data Using Neural Networks with Diverse Voice Features

1
Department of Industrial Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea
2
Neopons Inc., 465 Dongdaegu-ro, Dong-gu, Daegu 41260, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(3), 1337; https://doi.org/10.3390/app15031337
Submission received: 23 December 2024 / Revised: 24 January 2025 / Accepted: 26 January 2025 / Published: 27 January 2025
(This article belongs to the Special Issue Deep Learning for Speech, Image and Language Processing)

Abstract

A person’s voice serves as an indicator of age, as it changes with anatomical and physiological influences throughout their life. Although age prediction is a subject of interest across various disciplines, age-prediction studies using Korean voices are limited. The few studies that have been conducted have limitations, such as the absence of specific age groups or detailed age categories. Therefore, this study proposes an optimal combination of speech features and deep-learning models to recognize detailed age groups using a large Korean-speech dataset. From the speech dataset, recorded by individuals ranging from their teens to their 50s, four speech features were extracted: the Mel spectrogram, log-Mel spectrogram, Mel-frequency cepstral coefficients (MFCCs), and ΔMFCCs. Using these speech features, four deep-learning models were trained: ResNet-50, 1D-CNN, 2D-CNN, and a vision transformer. A performance comparison of speech feature-extraction methods and models indicated that MFCCs + ΔMFCCs was the best for both sexes when trained on the 1D-CNN model; it achieved an accuracy of 88.16% for males and 81.95% for females. The results of this study are expected to contribute to the future development of Korean speaker-recognition systems.
Keywords: age prediction; speaker recognition; voice feature extraction; convolutional neural network; vision transformer age prediction; speaker recognition; voice feature extraction; convolutional neural network; vision transformer

Share and Cite

MDPI and ACS Style

Ku, H.; Lee, J.; Lee, M.; Kim, S.; Yoon, J. Age Prediction from Korean Speech Data Using Neural Networks with Diverse Voice Features. Appl. Sci. 2025, 15, 1337. https://doi.org/10.3390/app15031337

AMA Style

Ku H, Lee J, Lee M, Kim S, Yoon J. Age Prediction from Korean Speech Data Using Neural Networks with Diverse Voice Features. Applied Sciences. 2025; 15(3):1337. https://doi.org/10.3390/app15031337

Chicago/Turabian Style

Ku, Hayeon, Jiho Lee, Minseo Lee, Seulgi Kim, and Janghyeok Yoon. 2025. "Age Prediction from Korean Speech Data Using Neural Networks with Diverse Voice Features" Applied Sciences 15, no. 3: 1337. https://doi.org/10.3390/app15031337

APA Style

Ku, H., Lee, J., Lee, M., Kim, S., & Yoon, J. (2025). Age Prediction from Korean Speech Data Using Neural Networks with Diverse Voice Features. Applied Sciences, 15(3), 1337. https://doi.org/10.3390/app15031337

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop