Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
Abstract
:1. Introduction
- A deep study on various feature sets used for classification is presented, and a multiple-feature integrated BiLSTM network is proposed.
- A conventional MFCC was obtained using a linear filter bank. We propose a new Gaussian-filtered inverted MFCC feature compared with conventional MFCC that provides a smooth transition between the subbands and maintains correlation within the same subband.
- RNN is the most effective method for spoof classification because it can handle short-term spectral features while responding to long-term temporal occurrences. LSTM networks overcome the vanishing gradients and long-term reliance problem of RNNs. BiLSTM was used in the proposed algorithm because the bidirectional strategy further improves recognition quality compared with the unidirectional approach.
- A Bayesian optimization algorithm, to optimize the hyper-parameters of BiLSTM while reducing its computational complexity and hidden layers, is presented.
- We used cutting-edge deep learning algorithms and compared their performance using assessment measurements.
- We present several issues based on the experimental evaluation and recommend possible solutions.
2. Related Work
3. Materials and Methods
3.1. Materials
3.2. Proposed Method
Feature Extraction Techniques
3.3. Proposed Optimized BiLSTM
Algorithm 1 Bayesian optimization algorithm to tune BiLSTM parameters |
|
4. Experimentation and Analysis
4.1. Experiment Setup
4.2. Analysis
Ref | Features | Classifier | Dev EER (%) | Eval EER (%) |
---|---|---|---|---|
[70] | Inverted constant Q-features and CQCC | DNN | 2.629 | 7.777 |
[67] | CFCCIF and quadrature-phase signals | ResNET | 2.33 | 12.88 |
[71] | LC-GRNN features from spectrogram | PLDA | 3.26 | 6.08 |
[72] | MFCC + Fbank | LSTM, GRU RNN | 6.32 | 9.81 |
[68] | Iterative adaptive inverse filtering -based glottal information + CQCC | Gaussian mixture model | 3.68 | 8.32 |
[69] | CQCC | LCNN | 21.73 | 8.20 |
[73] | Normalized log-power magnitude spectrum using Q-transform and FFT | Conventional CNN + RNN | 3.95 | 6.73 |
[65] | Local interpretable model -Agnostic explanations | LCNN-FFT | 7.6 | 10.6 |
[66] | Spectrogram | LSTM | - | 21.0602 |
[66] | Group delay-Gram | ResNET-18 | - | 35.35 |
[11] | Linear filter bank-based high frequency features | DenseNET + BiLSTM | 2.79 | 6.43 |
[74] | MFCC + CQCC High-frequency features | DenseNET + LSTM | 3.62 | 8.84 |
[47] | Improved TEO | LCNN | 6.98 | 13.54 |
[43] | CQCC + deep learning features | LSTM | 7.73 | |
Proposed | Static + dynamic GIMFCC + GTCC+ spectral features | Optimized BiLSTM | 1.02 | 6.58 |
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MFCC | Mel Frequency Cepstral Coefficients |
GIMFCC | Gaussian-Inverted MFCC |
GTCC | Gammatone Cepstrum Coefficient |
LPC | Linear Prediction Coefficients |
LFCC | Linear Frequency Cepstrum Coefficients |
ECG | Electrocardigram |
CQCC | Constant Q Cepstral Coefficients |
CQT | Constant Q Transform |
GMM | Gaussain Mixture Model |
CNN | Convolutional Neural Network |
RNN | Recurrent Neural Network |
LSTM | Long short-term memory |
BiLSTM | A bidirectional LSTM |
SVM | Support Vector Machine |
EER | Equal Error Rate |
XGBoost | Extended Gradient Boost |
ReLU | Rectified Linear Unit |
ResNET | Residual Neural Network |
PCM | Pulse Code Modulation |
t-SNE | T-Stochastic Neighborhood Embedding |
References
- Wu, Z.; Evans, N.; Kinnunen, T.; Yamagishi, J.; Alegre, F.; Li, H. Spoofing and countermeasures for speaker verification: A survey. Speech Commun. 2015, 66, 130–153. [Google Scholar] [CrossRef]
- Kinnunen, T.; Sahidullah, M.; Delgado, H.; Todisco, M.; Evans, N.; Yamagishi, J.; Lee, K.A. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection; The International Speech Communication Association: Berlin, Germany, 2017. [Google Scholar]
- Ghaderpour, E.; Pagiatakis, S.D.; Hassan, Q.K. A survey on change detection and time series analysis with applications. Appl. Sci. 2021, 11, 6141. [Google Scholar] [CrossRef]
- Mewada, H.K.; Patel, A.V.; Chaudhari, J.; Mahant, K.; Vala, A. Wavelet features embedded convolutional neural network for multiscale ear recognition. J. Electron. Imaging 2020, 29, 043029. [Google Scholar] [CrossRef]
- Alim, S.A.; Rashid, N.K.A. Some Commonly Used Speech Feature Extraction Algorithms; IntechOpen: London, UK, 2018. [Google Scholar]
- Mewada, H. 2D-wavelet encoded deep CNN for image-based ECG classification. In Multimedia Tools and Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–17. [Google Scholar]
- Witkowski, M.; Kacprzak, S.; Zelasko, P.; Kowalczyk, K.; Galka, J. Audio Replay Attack Detection Using High-Frequency Features. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 27–31. [Google Scholar]
- Singh, M.; Pati, D. Usefulness of linear prediction residual for replay attack detection. AEU-Int. J. Electron. Commun. 2019, 110, 152837. [Google Scholar] [CrossRef]
- Yang, J.; Das, R.K. Low frequency frame-wise normalization over constant-Q transform for playback speech detection. Digit. Signal Process. 2019, 89, 30–39. [Google Scholar] [CrossRef]
- Sriskandaraja, K.; Sethu, V.; Ambikairajah, E. Deep siamese architecture based replay detection for secure voice biometric. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 671–675. [Google Scholar]
- Huang, L.; Pun, C.M. Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 1813–1825. [Google Scholar] [CrossRef]
- Zaw, T.H.; War, N. The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection. In Proceedings of the 2017 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 22–24 December 2017; pp. 1–5. [Google Scholar]
- Singh, S.; Rajan, E. Vector quantization approach for speaker recognition using MFCC and inverted MFCC. Int. J. Comput. Appl. 2011, 17, 1–7. [Google Scholar] [CrossRef]
- Singh, S.; Rajan, D.E. A Vector Quantization approach Using MFCC for Speaker Recognition. In Proceedings of the International Conference Systemic, Cybernatics and Informatics ICSCI under the Aegis of Pentagram Research Centre Hyderabad, Hyderabad, India, 4–7 January 2007; pp. 786–790. [Google Scholar]
- Chakroborty, S.; Saha, G. Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. Int. J. Signal Process. 2009, 5, 11–19. [Google Scholar]
- Jelil, S.; Das, R.K.; Prasanna, S.M.; Sinha, R. Spoof detection using source, instantaneous frequency and cepstral features. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 22–26. [Google Scholar]
- Sahidullah, M.; Kinnunen, T.; Hanilçi, C. A comparison of features for synthetic speech detection. In Proceedings of the 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015. [Google Scholar]
- Loweimi, E.; Barker, J.; Hain, T. Statistical normalisation of phase-based feature representation for robust speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5310–5314. [Google Scholar]
- Pal, M.; Paul, D.; Saha, G. Synthetic speech detection using fundamental frequency variation and spectral features. Comput. Speech Lang. 2018, 48, 31–50. [Google Scholar] [CrossRef]
- Patil, A.T.; Patil, H.A.; Khoria, K. Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection. Comput. Speech Lang. 2022, 72, 101301. [Google Scholar] [CrossRef]
- Kadiri, S.R.; Yegnanarayana, B. Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC). In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 441–445. [Google Scholar]
- Kethireddy, R.; Kadiri, S.R.; Gangashetty, S.V. Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations. J. Acoust. Soc. Am. 2022, 151, 1077–1092. [Google Scholar] [CrossRef] [PubMed]
- Kethireddy, R.; Kadiri, S.R.; Kesiraju, S.; Gangashetty, S.V. Zero-Time Windowing Cepstral Coefficients for Dialect Classification. In Proceedings of the The Speaker and Language Recognition Workshop (Odyssey), Tokyo, Japan, 2–5 November 2020; pp. 32–38. [Google Scholar]
- Kadiri, S.R.; Alku, P. Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech. In Proceedings of the Interspeech, Graz, Austria, 15–19 September 2019; pp. 2508–2512. [Google Scholar]
- Mewada, H.K.; Chaudhari, J. Low computation digital down converter using polyphase IIR filter. Circuit World 2019, 45, 169–178. [Google Scholar] [CrossRef]
- Loweimi, E.; Ahadi, S.M.; Drugman, T. A new phase-based feature representation for robust speech recognition. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 7155–7159. [Google Scholar]
- Dua, M.; Aggarwal, R.K.; Biswas, M. Discriminative training using noise robust integrated features and refined HMM modeling. J. Intell. Syst. 2020, 29, 327–344. [Google Scholar] [CrossRef]
- Rahmeni, R.; Aicha, A.B.; Ayed, Y.B. Speech spoofing detection using SVM and ELM technique with acoustic features. In Proceedings of the 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2–5 September 2020; pp. 1–4. [Google Scholar]
- Muckenhirn, H.; Korshunov, P.; Magimai-Doss, M.; Marcel, S. Long-term spectral statistics for voice presentation attack detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 2098–2111. [Google Scholar] [CrossRef] [Green Version]
- Zhang, C.; Yu, C.; Hansen, J.H. An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J. Sel. Top. Signal Process. 2017, 11, 684–694. [Google Scholar] [CrossRef]
- Ghosh, R.; Phadikar, S.; Deb, N.; Sinha, N.; Das, P.; Ghaderpour, E. Automatic Eyeblink and Muscular Artifact Detection and Removal From EEG Signals Using k-Nearest Neighbor Classifier and Long Short-Term Memory Networks. IEEE Sens. J. 2023, 23, 5422–5436. [Google Scholar] [CrossRef]
- Jo, J.; Kung, J.; Lee, Y. Approximate LSTM computing for energy-efficient speech recognition. Electronics 2020, 9, 2004. [Google Scholar] [CrossRef]
- Gong, P.; Wang, P.; Zhou, Y.; Zhang, D. A Spiking Neural Network With Adaptive Graph Convolution and LSTM for EEG-Based Brain-Computer Interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1440–1450. [Google Scholar] [CrossRef]
- Wu, Z.; Kinnunen, T.; Evans, N.; Yamagishi, J.; Hanilçi, C.; Sahidullah, M.; Sizov, A. ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015. [Google Scholar]
- Todisco, M.; Delgado, H.; Evans, N.W. A new feature for automatic speaker verification anti-spoofing: Constant q cepstral coefficients. In Proceedings of the Odyssey, Bilbao, Spain, 21–24 June 2016; Volume 2016, pp. 283–290. [Google Scholar]
- Xue, J.; Zhou, H.; Song, H.; Wu, B.; Shi, L. Cross-modal information fusion for voice spoofing detection. Speech Commun. 2023, 147, 41–50. [Google Scholar] [CrossRef]
- Alluri, K.R.; Achanta, S.; Kadiri, S.R.; Gangashetty, S.V.; Vuppala, A.K. Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 2596–2600. [Google Scholar]
- Bharath, K.; Kumar, M.R. Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features. Multimed. Tools Appl. 2022, 81, 39343–39366. [Google Scholar] [CrossRef]
- Woubie, A.; Bäckström, T. Voice Quality Features for Replay Attack Detection. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 384–388. [Google Scholar]
- Chaudhari, A.; Shedge, D. Integration of CQCC and MFCC based Features for Replay Attack Detection. In Proceedings of the 2022 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 9–11 March 2022; pp. 1–5. [Google Scholar]
- Rahmeni, R.; Aicha, A.B.; Ayed, Y.B. Voice spoofing detection based on acoustic and glottal flow features using conventional machine learning techniques. Multimed. Tools Appl. 2022, 81, 31443–31467. [Google Scholar] [CrossRef]
- Naith, Q. Thesis title: Crowdsourced Testing Approach For Mobile Compatibility Testing. Ph.D. Thesis, University of Sheffield, Sheffield, UK, 2021. [Google Scholar]
- Sizov, A.; Khoury, E.; Kinnunen, T.; Wu, Z.; Marcel, S. Joint speaker verification and antispoofing in the i-vector space. IEEE Trans. Inf. Forensics Secur. 2015, 10, 821–832. [Google Scholar] [CrossRef] [Green Version]
- Luo, A.; Li, E.; Liu, Y.; Kang, X.; Wang, Z.J. A Capsule Network Based Approach for Detection of Audio Spoofing Attacks. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 6359–6363. [Google Scholar]
- Monteiro, J.; Alam, J.; Falk, T.H. An ensemble based approach for generalized detection of spoofing attacks to automatic speaker recognizers. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6599–6603. [Google Scholar]
- Alluri, K.R.; Achanta, S.; Kadiri, S.R.; Gangashetty, S.V.; Vuppala, A.K. SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 107–111. [Google Scholar]
- Patil, A.T.; Acharya, R.; Patil, H.A.; Guido, R.C. Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection. Comput. Speech Lang. 2022, 72, 101281. [Google Scholar] [CrossRef]
- Tom, F.; Jain, M.; Dey, P. End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 681–685. [Google Scholar]
- Lai, C.I.; Chen, N.; Villalba, J.; Dehak, N. ASSERT: Anti-spoofing with squeeze-excitation and residual networks. arXiv 2019, arXiv:1904.01120. [Google Scholar]
- Scardapane, S.; Stoffl, L.; Röhrbein, F.; Uncini, A. On the use of deep recurrent neural networks for detecting audio spoofing attacks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 3483–3490. [Google Scholar]
- Mittal, A.; Dua, M. Static–dynamic features and hybrid deep learning models based spoof detection system for ASV. Complex Intell. Syst. 2022, 8, 1153–1166. [Google Scholar] [CrossRef]
- Dinkel, H.; Qian, Y.; Yu, K. Investigating raw wave deep neural networks for end-to-end speaker spoofing detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 2002–2014. [Google Scholar] [CrossRef]
- Mittal, A.; Dua, M. Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network. Int. J. Swarm Intell. 2021, 6, 143–153. [Google Scholar] [CrossRef]
- Chintha, A.; Thai, B.; Sohrawardi, S.J.; Bhatt, K.; Hickerson, A.; Wright, M.; Ptucha, R. Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Sel. Top. Signal Process. 2020, 14, 1024–1037. [Google Scholar] [CrossRef]
- Alzantot, M.; Wang, Z.; Srivastava, M.B. Deep residual neural networks for audio spoofing detection. arXiv 2019, arXiv:1907.00501. [Google Scholar]
- Wu, Z.; Das, R.K.; Yang, J.; Li, H. Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. arXiv 2020, arXiv:2009.09637. [Google Scholar]
- Li, J.; Wang, H.; He, P.; Abdullahi, S.M.; Li, B. Long-term variable Q transform: A novel time-frequency transform algorithm for synthetic speech detection. Digit. Signal Process. 2022, 120, 103256. [Google Scholar] [CrossRef]
- Sahidullah, M.; Delgado, H.; Todisco, M.; Kinnunen, T.; Evans, N.; Yamagishi, J.; Lee, K.A. Introduction to voice presentation attack detection and recent advances. In Handbook of Biometric Anti-Spoofing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 321–361. [Google Scholar]
- Brancoa, H.M.C.; Reisa, J.B.; Pereirab, L.M.; Sáa, L.d.C.; de AL Rabeloa, R. Transmission line fault location using MFCC and LS-SVR. Learn. Nonlinear Model. J. Braz. Soc. Comput. Intell. 2023, 21, 110–122. [Google Scholar] [CrossRef]
- Paul, D.; Pal, M.; Saha, G. Novel speech features for improved detection of spoofing attacks. In Proceedings of the 2015 Annual IEEE India Conference (INDICON), New Delhi, India, 17–20 December 2015; pp. 1–6. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Mu, J.; Fan, H.; Zhang, W. High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback. arXiv 2021, arXiv:2109.10591. [Google Scholar]
- Doke, P.; Shrivastava, D.; Pan, C.; Zhou, Q.; Zhang, Y.D. Using CNN with Bayesian optimization to identify cerebral micro-bleeds. Mach. Vis. Appl. 2020, 31, 36. [Google Scholar] [CrossRef]
- Ohsaki, M.; Wang, P.; Matsuda, K.; Katagiri, S.; Watanabe, H.; Ralescu, A. Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 2017, 29, 1806–1819. [Google Scholar] [CrossRef]
- Chettri, B.; Mishra, S.; Sturm, B.L.; Benetos, E. Analysing the predictions of a cnn-based replay spoofing detection system. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 18–21 December 2018; pp. 92–97. [Google Scholar]
- Yoon, S.H.; Koh, M.S.; Park, J.H.; Yu, H.J. A new replay attack against automatic speaker verification systems. IEEE Access 2020, 8, 36080–36088. [Google Scholar] [CrossRef]
- Gupta, P.; Chodingala, P.K.; Patil, H.A. Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components. Comput. Speech Lang. 2023, 77, 101423. [Google Scholar] [CrossRef]
- Bharath, K.; Kumar, M.R. New replay attack detection using iterative adaptive inverse filtering and high frequency band. Expert Syst. Appl. 2022, 195, 116597. [Google Scholar] [CrossRef]
- Süslü, Ç.; Eren, E.; Demiroğlu, C. Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach. Speech Commun. 2022, 137, 44–51. [Google Scholar] [CrossRef]
- Yang, J.; Das, R.K. Long-term high frequency features for synthetic speech detection. Digit. Signal Process. 2020, 97, 102622. [Google Scholar] [CrossRef]
- Gomez-Alanis, A.; Peinado, A.M.; Gonzalez, J.A.; Gomez, A.M. A light convolutional GRU-RNN deep feature extractor for ASVSpoofing detection. In Proceedings of the Interspeech, Graz, Austria, 15–19 September 2019; Volume 2019, pp. 1068–1072. [Google Scholar]
- Chen, Z.; Zhang, W.; Xie, Z.; Xu, X.; Chen, D. Recurrent neural networks for automatic replay spoofing attack detection. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2052–2056. [Google Scholar]
- Lavrentyeva, G.; Novoselov, S.; Malykh, E.; Kozlov, A.; Kudashev, O.; Shchemelinin, V. Audio Replay Attack Detection with Deep Learning Frameworks. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 82–86. [Google Scholar]
- Huang, L.; Pun, C.M. Audio replay spoof attack detection using segment-based hybrid feature and densenet-LSTM network. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2567–2571. [Google Scholar]
- Huang, L.; Zhao, J. Audio replay spoofing attack detection using deep learning feature and long-short-term memory recurrent neural network. In Proceedings of the AIIPCC 2021, The Second International Conference on Artificial Intelligence, Information Processing and Cloud Computing, VDE, Hangzhou, China, 26-28 June 2021; pp. 1–5. [Google Scholar]
Dataset Summary | Use of Dataset in the Experiment | |
---|---|---|
Speech format | Precision: 16 Bit PCM, Sampling rate = 16 kH | Precision: 16 Bit PCM, Sampling rate = 16 kHz |
Spoofing types in train/dev/eval | 3/10/57 | 3 /10/ 57 |
Total speakers in train/dev/eval | 10/8/24 | 18 from train, 24 from eval |
No. of genuine speech in train/dev/eval | 1507/760/1298 | 2267 from train and dev to train the network, 1298 from eval dataset |
No. of spoofed speech in train/dev/eval | 1507/950/12,008 | 2457 from train and dev to train the network, 12,008 from eval dataset |
Layer | Layer’s Name | Main Parameters | Other Parameters |
---|---|---|---|
1 | Sequence Layer | Size of training features | - |
2 | BiLSTM | 50 | Returned Sequences = True |
3 | BiLSTM | 50 | Returned Sequences = True |
4 | Fully Connected Layer | - | - |
5 | Dense | - | Activation softmax |
6 | Classification Layer | 2 | - |
Hyper-Parameter | Range | Optimum Value |
---|---|---|
Section Depth | 2 to 6 | 2 |
Learning Rate | 10, 10, 10, 10, 10, 10, 1 | 10 |
Momentum | 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 0.90, 0.98 | 0.5 |
L2-Regularization | 1 × 10 to 1 × 10 | 1 × 10 |
Dataset/Attack Types | Accuracy | Precision | Recall |
---|---|---|---|
Validation dataset | 97.35 | 95.78 | 99.08 |
Evaluation dataset | 98.58 | 94.92 | 90.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mewada, H.; Al-Asad, J.F.; Almalki, F.A.; Khan, A.H.; Almujally, N.A.; El-Nakla, S.; Naith, Q. Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification. Sensors 2023, 23, 6637. https://doi.org/10.3390/s23146637
Mewada H, Al-Asad JF, Almalki FA, Khan AH, Almujally NA, El-Nakla S, Naith Q. Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification. Sensors. 2023; 23(14):6637. https://doi.org/10.3390/s23146637
Chicago/Turabian StyleMewada, Hiren, Jawad F. Al-Asad, Faris A. Almalki, Adil H. Khan, Nouf Abdullah Almujally, Samir El-Nakla, and Qamar Naith. 2023. "Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification" Sensors 23, no. 14: 6637. https://doi.org/10.3390/s23146637
APA StyleMewada, H., Al-Asad, J. F., Almalki, F. A., Khan, A. H., Almujally, N. A., El-Nakla, S., & Naith, Q. (2023). Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification. Sensors, 23(14), 6637. https://doi.org/10.3390/s23146637