Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation
Abstract
:1. Introduction
- (1)
- Multifaceted Self-Supervised Learning Mechanisms: Introducing various self-supervised learning mechanisms based on spectrograms in audio classification, we construct two self-supervised strategies: time-frequency random masking and spectrogram block random masking. Through contrastive learning, we achieve discriminative feature learning, and by self-learning information reconstruction through masking, we effectively capture intricate details in the audio spectrum.
- (2)
- Feature Reconstruction Strategies: In the realm of feature reconstruction, two teacher-student learning strategies are devised. Leveraging knowledge distillation learning mechanisms, these strategies enhance model feature representation, leading to rapid convergence and efficient learning.
- (3)
- Experimental Validation: Experimental results demonstrate the effectiveness of combining spectrogram-based self-supervised strategies for learning intricate audio features. Furthermore, feature reconstruction enhances model learning, yielding excellent results in multiple publicly available audio classification test sets. Notably, in pure audio recognition on the AudioSet-2M, ESC-50, and VGG Sound datasets, the proposed method achieves accuracy rates of 49.9%, 98.7%, and 61.3%, respectively. These results surpass the current state-of-the-art single-modal methods by 1.3%, 0.6%, and 0.5%, respectively.
2. Related Work
3. Multi-Dimensional Self-Supervised Learning
3.1. Self-Supervised Information Construction
3.2. Multi-Dimensional Self-Supervised Modeling
3.3. Information Reconstruction and Feature Fitting
3.4. Knowledge Distillation
4. Experiments
4.1. Impact of Multi-Dimensional Self-Supervised Tasks on Classification
4.2. Impact of Audio Knowledge Distillation on Classification
4.3. Comparative Analysis of Classification Results Using Different Methods
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Algorithm A1 ACM-SSKD Pseudocode of this paper in a PyTorch-like style |
# spectrograms: Convert audio signals into two-dimensional spectrogram image. # intra_patch, patch, inter_patch: Blocking and masking operations. # AFT: Audio feature Transformer, as shown in Figure 2. # MAE: Audio mask auto encoder, as shown in Figure 2. for iter in iters: # iters represents the number of iterations for x in batch: #x is one-dimensional audio raw data x_spec = spectrograms(x) # x_spec is two-dimensional spectrogram image x_raw = patch(x_spec) # patch raw, as shown in Figure 2 x_imtra = intra_patch(x_raw) # intra-patch mask, as shown in Figure 2 x_inter = intra_patch(x_raw) # intra-patch mask, as shown in Figure 2 # Multi-SSL process, res * and fea *, respectively, represent reconstructed images # and features extracted through networks res_raw, fea_raw = AFT(x_raw) res_intra, fea_intra = AFT(x_intra) res_inter, fea_inter = AFT(MAE(x_inter)) #restore spectrogram image, WMse as shown in Formula (3). l_spec_res = WMse(res_intra, res_raw) + WMse(res_inter, res_raw) # restore the features generated from the patch raw, MSE as shown in Formula (6). l_fea_mse = Mse(fea_intra, fea_raw) + Mse(fea_inter, fea_raw) # restore the features generated from the patch raw, KL as shown in Formula (7). l_fea_kl = KL(fea_intra, fea_raw) + KL(fea_inter, fea_raw) # comparative learning of features, Cont as shown in Formula (9). l_fea_cont = Cont(AVG(fea_intra), AVG(fea_raw)) + Cont(AVG(fea_inter), AVG(fea_raw)) l_multi_ssl = α × (l_spec_res + l_fea_mse)+ β × l_fea_kl + l_fea_cont #end for Multi-SSL # Knowledge distillation process. # CE as shown in Formula (11), Reg as shown in Formula (12). # fea_cls is generated by AFT network after Multi-SSL operate. # fea_teacher is generated by teacher model. l_cls = CE(fea_cls, fea_teacher) l_res = Mse(fea_cls, fea_teacher) l_kd = l_cls + γ × l_res # end for Knowledge distillation |
References
- Kong, Q.; Cao, Y.; Iqbal, T.; Wang, Y.; Wang, W.; Plumbley, M.D. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE ACM Trans. Audio Speech Lang. Process. 2020, 28, 2880–2894. [Google Scholar] [CrossRef]
- Hsu, W.N.; Bolte, B.; Tsai, Y.H.H.; Lakhotia, K.; Salakhutdinov, R.; Mohamed, A. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE ACM Trans. Audio Speech Lang. Process. 2021, 29, 3451–3460. [Google Scholar] [CrossRef]
- Verma, P.; Berger, J. Audio Transformers: Transformer Architectures for Large Scale Audio Understanding. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 17–20 October 2021; pp. 1–5. [Google Scholar]
- Arnault, A.; Hanssens, B.; Riche, N. Urban Sound Classification: Striving towards a fair comparison. arXiv 2020, arXiv:2010.11805. [Google Scholar]
- Gong, Y.; Chung, Y.A.; Glass, J. AST: Audio Spectrogram Transformer. In Proceedings of the IEEE Conference on Interspeech, Brno, Czechia, 30 August–3 September 2021; pp. 571–575. [Google Scholar]
- Liu, A.T.; Li, S.W.; Tera, H.L. Self-supervised learning of transformer encoder representation for speech. IEEE ACM Trans. Audio Speech Lang. Process. 2021, 29, 2351–2366. [Google Scholar] [CrossRef]
- Chi, P.H.; Chung, P.H.; Wu, T.H.; Hsieh, C.C.; Chen, Y.H.; Li, S.W.; Lee, H.Y. Audio albert: A lite bert for self-supervised learning of audio representation. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China, 19–22 January 2021; pp. 344–350. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Giraldo, J.S.P.; Jain, V.; Verhelst, M. Efficient Execution of Temporal Convolutional Networks for Embedded Keyword Spotting. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 2220–2228. [Google Scholar] [CrossRef]
- Yuan, G.; Yu, A.C.; James, G. PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3292–3306. [Google Scholar]
- Schmid, F.; Koutini, K.; Widmer, G. Efficient Large-Scale Audio Tagging Via Transformer-to-CNN Knowledge Distillation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Arsha, N.; Shan, Y.; Anurag, A.; Jansen, A.; Schmid, C.; Sun, C. Attention bottlenecks for multimodal fusion. J. Adv. Neural Inf. Process. Syst. 2021, 34, 14200–14213. [Google Scholar]
- Chen, K.; Du, X.; Zhu, B.; Ma, Z.; Berg-Kirkpatrick, T.; Dubnov, S. Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 646–650. [Google Scholar]
- Sergey, V.; Vladimir, B.; Viacheslav, V. Eranns: Efficient residual audio neural networks for audio pattern recognition. J. Pattern Recognit. Lett. 2022, 161, 38–44. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 8–24 July 2021; pp. 10347–10357. [Google Scholar]
- Ze, L.; Yutong, L.; Yue, C.; Han, H.; Wei, Y.; Zheng, Z.; Stephen, L.; Baining, G. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Xie, X.; Zhang, H.; Wang, J.; Chang, Q.; Wang, J.; Pal, N.R. Learning optimized structure of neural networks by hidden node pruning with L1 regularization. IEEE Trans. Cybern. 2020, 50, 1333–1346. [Google Scholar] [CrossRef]
- Dading, C.; Helin, W.; Peilin, Z.; Zeng, Q.C. Masked spectrogram prediction for self-supervised audio pre-training. arXiv 2022, arXiv:2204.12768. [Google Scholar]
- Huang, P.Y.; Xu, H.; Li, J.; Baevski, A.; Auli, M.; Galuba, W.; Metze, F.; Feichtenhofer, C. Masked autoencoders that listen. arXiv 2022, arXiv:2207.06405. [Google Scholar]
- Yu, Z.; Daniel, S.P.; Wei, H.; Qin, J.; Gulati, A.; Shor, J.; Jansen, A.; Xu, Y.Z.; Huang, Y.; Wang, S. Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J. Sel. Top. Signal Process. 2022, 16, 1519–1532. [Google Scholar]
- Chen, S.; Wu, Y.; Wang, C.; Liu, S.; Tompkins, D.; Chen, Z.; Wei, F. BEATS: Audio Pre-Training with Acoustic Tokenizers. In Proceedings of the 40th International Conference on Machine LearningJuly, Honolulu Hawaii, HI, USA, 23–29 July 2023; pp. 5178–5193. [Google Scholar]
- Baevski, A.; Hsu, W.N.; Xu, Q.; Babu, A.; Gu, J.; Auli, M. Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 1298–1312. [Google Scholar]
- Chen, S.; Wang, C.; Chen, Z.; Wu, Y.; Liu, S.; Chen, Z.; Li, J.; Kanda, N.; Yoshioka, T.; Xiao, X.; et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process. 2022, 16, 1505–1518. [Google Scholar] [CrossRef]
- Gong, Y.; Lai, C.I.; Chung, Y.A.; Glass, J. Ssast: Self-supervised audio spectrogram transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; pp. 10699–10709. [Google Scholar]
- Huang, P.Y.; Sharma, V.; Xu, H.; Ryali, C.; Fan, H.; Li, Y.; Li, S.W.; Ghosh, G.; Malik, J.; Feichtenhofer, C. MAViL: Masked Audio-Video Learners. arXiv 2022, arXiv:2212.08071. [Google Scholar]
- Wei, Y.; Hu, H.; Xie, Z.; Zhang, Z.; Cao, Y.; Bao, J.; Chen, D.; Guo, B. Contrastive learning rivals masked image modeling in fine-tuning via feature distillation. arXiv 2022, arXiv:2205.14141. [Google Scholar]
- Chen, H.; Xie, W.; Vedaldi, A.; Zisserman, A. VGGSound: A large-scale audio-visual dataset. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 721–725. [Google Scholar]
- Wei, C.; Fan, H.; Xie, S.; Wu, C.Y.; Yuille, A.; Feichtenhofer, C. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14668–14678. [Google Scholar]
- Wu, Y.; Chen, K.; Zhang, T.; Hui, Y.; Taylor, B.K.; Dubnov, S. Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Aaqib, S.; David, G.; Neil, Z. Contrastive learning of general-purpose audio representations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3875–3879. [Google Scholar]
- Eduardo, F.; Diego, O.; Kevin, M.; Noel, E.O.C.; Serra, X. Unsupervised contrastive learning of sound event representations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 371–375. [Google Scholar]
- Haider, A.T.; Yalda, M. Clar: Contrastive learning of auditory representations. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Toronto, ON, Canada, 6–11 June 2021; pp. 2530–2538. [Google Scholar]
- Luyu, W.; Aaron, O. Multi-format contrastive learning of audio representations. arXiv 2021, arXiv:2103.06508. [Google Scholar]
- Daisuke, N.; Daiki, T.; Yasunori, O.; Harada, N.; Kashino, K. Byol for audio: Self-supervised learning for general-purpose audio representation. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Alan, B.; Puyuan, P.; David, H. Mae-ast: Masked autoencoding audio spectrogram transformer. In Proceedings of the 23rd Interspeech Conference, Incheon, Republic of Korea, 18–22 September 2022; pp. 2438–2442. [Google Scholar]
- Andrew, N.C.; Quentin, B.; Mathieu, B.; Teboul, O.; Zeghidour, N. Selfsupervised learning of audio representations from permutations with differentiable ranking. J. IEEE Signal Process. Lett. 2021, 28, 708–712. [Google Scholar]
- Gong, X.; Yu, L.; Wang, J.; Zhang, K.; Bai, X.; Pal, N.R. Unsupervised Feature Selection via Adaptive Autoencoder with Redundancy Control. Neural Netw. 2022, 150, 87–101. [Google Scholar] [CrossRef]
- Aditya, R.; Mikhail, P.; Gabriel, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 8–24 July 2021; pp. 8821–8831. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Feichtenhofer, C.; Li, Y.; He, K. Masked Autoencoders as Spatiotemporal Learners. arXiv 2022, arXiv:2205.09113. [Google Scholar]
- Liu, A.T.; Yang, S.; Chi, P.H.; Hsu, P.C.; Lee, H. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6419–6423. [Google Scholar]
- Gemmeke, J.F.; Ellis, D.P.W.; Freedman, D.; Jansen, A.; Lawrence, W.; Moore, R.C.; Plakal, M.; Ritter, M. Audio set: An ontology and human-labeled dataset for audio events. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 776–780. [Google Scholar]
- Piczak, K.J. Esc: Dataset for environmental sound classification. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 1015–1018. [Google Scholar]
- Gong, X.; Li, Z. An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision. J. Comput.-Aided Des. Comput. Graph. 2023, 35, 434–440. [Google Scholar]
- Yuan, G.; Andrew, R.; Alexander, H.L.; Harwath, D.; Karlinsky, L.; Kuehne, H.; Glass, J. Contrastive Audio-Visual Masked Autoencoder. In Proceedings of the 17th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023; pp. 1–29. [Google Scholar]
- Evangelos, K.; Arsha, N.; Andrew, Z.; Dima, D. Slow-fast auditory streams for audio recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 855–859. [Google Scholar]
Comparison Metrics | Base | +Intra-SSR | +HOG Weight |
---|---|---|---|
Audioset 20K | 0.371 | 0.374 | 0.377 |
VGGSound 20K | 0.537 | 0.542 | 0.544 |
Loss Terms | Audioset 20K | VGGSound 20K |
---|---|---|
Base | 0.377 | 0.544 |
+intra-SSF & mse | 0.383 | 0.549 |
+intra-SSF & KL | 0.383 | 0.551 |
+intra-SSF & cont | 0.391 | 0.557 |
Mask Ratios | 45% | 55% | 65% | 75% | 85% |
---|---|---|---|---|---|
Audioset 20K | 0.353 | 0.361 | 0.363 | 0.365 | 0.364 |
VGGSound 20K | 0.526 | 0.539 | 0.541 | 0.542 | 0.542 |
Loss Functions | Base | +Inter-SSR | +HOG Weight |
---|---|---|---|
Audioset 20K | 0.371 | 0.379 | 0.382 |
VGGSound 20K | 0.537 | 0.544 | 0.546 |
Loss Functions | Audioset 20K | VGGSound 20K |
---|---|---|
Base | 0.382 | 0.546 |
+inter-SSF & mse | 0.387 | 0.551 |
+inter-SSF & KL | 0.389 | 0.552 |
+inter-SSF & cont | 0.394 | 0.561 |
Self-Supervision Approaches | Audioset 20K | VGGSound 20K |
---|---|---|
Base | 0.371 | 0.537 |
intra-SS | 0.391 | 0.557 |
inter-SS | 0.394 | 0.561 |
intra-SS & inter-SS | 0.402 | 0.566 |
Initialization Methods | Audioset 20K | VGGSound 20K |
---|---|---|
Random Init | 0.402 | 0.566 |
ImageNet Init | 0.404 | 0.570 |
Teacher Model Categories | Audioset 20K | VGGSound 20K |
---|---|---|
None | 0.402 | 0.566 |
AST | 0.405 | 0.568 |
SPFA | 0.406 | 0.571 |
BEATS | 0.409 | 0.575 |
Iteration Counts | Audioset 20K | VGGSound 20K |
---|---|---|
iter1 | 0.409 | 0.575 |
iter2 | 0.415 | 0.583 |
iter3 | 0.417 | 0.586 |
Method | Model Param | Pre-Trained Data | Audioset |
---|---|---|---|
PANN [1] | 81M | - | 0.431 |
PSLA [10] | 14M | ImageNet | 0.444 |
ERANN [14] | 55M | - | 0.450 |
AST [5] | 86M | ImageNet + AudioSet | 0.459 |
PaSST [11] | 86M | ImageNet + AudioSet | 0.471 |
Hts-at [13] | 31M | ImageNet + AudioSet | 0.471 |
MaskedSpec [18] | 86M | AudioSet | 0.471 |
CAV-MAE [45] | 94M | ImageNet + AudioSet | 0.449 |
SPFA (Single) [44] | 87M | - | 0.464 |
BEATS (iter1) [21] | 90M | AudioSet | 0.479 |
BEATS (iter2) [21] | 90M | AudioSet | 0.481 |
BEATS (iter3) [21] | 90M | AudioSet | 0.480 |
BEATS (iter3+) [21] | 90M | AudioSet | 0.486 |
ACM-SSKD (iter1) | 92M | ImageNet + AudioSet | 0.483 |
ACM-SSKD (iter2) | 92M | ImageNet + AudioSet | 0.486 |
ACM-SSKD (iter3) | 92M | ImageNet + AudioSet | 0.492 |
ACM-SSKD (Ensemble) | 92M | ImageNet + AudioSet | 0.499 |
Method | Model Param | Pre-Trained Data | VGGSound |
---|---|---|---|
VGGSound [27] | 81M | - | 0.488 |
CAV-MAE [45] | 87M | ImageNet + AudioSet | 0.595 |
MBT [12] | 87M | ImageNet 21K | 0.523 |
Aud-SlowFast [46] | - | - | 0.501 |
MAViL [25] | 87M | ImageNet + AudioSet | 0.608 |
ACM-SSKD (iter1) | 92M | ImageNet + AudioSet | 0.579 |
ACM-SSKD (iter2) | 92M | ImageNet + AudioSet | 0.588 |
ACM-SSKD (iter3) | 92M | ImageNet + AudioSet | 0.605 |
ACM-SSKD (Ensemble) | 92M | ImageNet + AudioSet | 0.613 |
Method | Model Param | Pre-Trained Data | ESC-50 |
---|---|---|---|
PANN [1] | 81M | - | 0.947 |
AST [5] | 86M | ImageNet | 0.956 |
ERANN [14] | 55M | AudioSet | 0.961 |
Audio-MAE [19] | 86M | AudioSet | 0.974 |
Ssast [24] | 89M | AudioSet + LibriSpeech | 0.888 |
MaskedSpec [18] | 86M | AudioSet | 0.896 |
Mae-ast [19] | 86M | AudioSet + LibriSpeech | 0.900 |
SPFA [44] | 87M | - | 0.968 |
BEATS (iter3) [21] | 90M | AudioSet | 0.956 |
BEATS (iter3+) [21] | 90M | AudioSet | 0.981 |
ACM-SSKD (iter1) | 92M | ImageNet + AudioSet | 0.962 |
ACM-SSKD (iter2) | 92M | ImageNet + AudioSet | 0.975 |
ACM-SSKD (iter3) | 92M | ImageNet + AudioSet | 0.984 |
ACM-SSKD (Ensemble) | 92M | ImageNet + AudioSet | 0.987 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, X.; Duan, H.; Yang, Y.; Tan, L.; Wang, J.; Vasilakos, A.V. Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation. Electronics 2024, 13, 52. https://doi.org/10.3390/electronics13010052
Gong X, Duan H, Yang Y, Tan L, Wang J, Vasilakos AV. Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation. Electronics. 2024; 13(1):52. https://doi.org/10.3390/electronics13010052
Chicago/Turabian StyleGong, Xuchao, Hongjie Duan, Yaozhong Yang, Lizhuang Tan, Jian Wang, and Athanasios V. Vasilakos. 2024. "Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation" Electronics 13, no. 1: 52. https://doi.org/10.3390/electronics13010052
APA StyleGong, X., Duan, H., Yang, Y., Tan, L., Wang, J., & Vasilakos, A. V. (2024). Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation. Electronics, 13(1), 52. https://doi.org/10.3390/electronics13010052