Determining the Age of the Author of the Text Based on Deep Neural Network Models
Abstract
:1. Introduction
- Differentiating users of online platforms by age to combat pedophilia and prevent children from accessing adult content;
- Authorship attribution of an anonymous note with threats;
- Authorship attribution of a suicide note.
- Collecting data from a social network;
- Pre-processing of texts;
- Analysis of text classification methods;
- Analysis of methods for determining age from a photo;
- Data filtering using computer vision (CV) algorithms;
- Classification of the original and processed data;
- Evaluation and analysis of results.
2. Literature Review
3. Methodology
- The age of the user is determined by the photo. Then, 2 years are added or subtracted to or from it.
- If the age specified by the user falls within the interval, then the counter increases. Otherwise, it remains unchanged.
- The age of the user is considered correct if the counter is equal to or more than half the number of the author’s photos.
- Learning rate: 0.1;
- Rate of updates for the learning rate: 100;
- Size of the context window: 5;
- Number of negative samples: 5;
- Loss function: softmax.
- Input layer dimension: 128;
- Optimization function: adaptive moment estimation (Adam);
- Loss function: binary cross entropy;
- Batch size: 32;
- Number of training epochs: 50.
4. Results
5. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Kurtukova, A.; Romanov, A. Identification Author of Source Code by Machine Learning Methods. SPIIRAS Proc. 2019, 18, 742–766. [Google Scholar] [CrossRef]
- Kurtukova, A.; Romanov, A.; Fedotova, A. De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms. In Proceedings of the 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), Yekaterinburg, Russia, 25–27 October 2019; pp. 612–617. [Google Scholar]
- Romanov, A.; Kurtukova, A.; Fedotova, A.; Meshcheryakov, R. Natural Text Anonymization Using Universal Transformer with a Self-attention. In Proceedings of the III International Conference on Language Engineering and Applied Linguistics (PRLEAL-2019), Saint Petersburg, Russia, 27 November 2019; pp. 22–37. [Google Scholar]
- Romanov, A.S.; Vasilieva, M.I.; Kurtukova, A.V.; Meshcheryakov, R.V. Sentiment Analysis of Text Using Machine Learning Techniques. In Proceedings of the 2nd International Conference “R. Piotrowski’s Readings LE & AL’2017”, Saint Petersburg, Russia, 27 November 2017; pp. 86–95. [Google Scholar]
- Kurtukova, A.; Romanov, A.; Shelupanov, A. Source Code Authorship Identification Using Deep Neural Networks. Symmetry 2020, 12, 2044. [Google Scholar] [CrossRef]
- Bianchi, G.; Bruni, R.; Scalfati, F. Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms. Math. Probl. Eng. 2018, 2018, 1–8. [Google Scholar] [CrossRef]
- Bruni, R.; Bianchi, G. Website categorization: A formal approach and robustness analysis in the case of e-commerce detection. Expert Syst. Appl. 2020, 142, 113001. [Google Scholar] [CrossRef] [Green Version]
- Rakhmanenko, I.; Shelupanov, A.; Kostyuchenko, E. Automatic text-independent speaker verification using convolutional deep belief network. Comput. Opt. 2020, 44, 596–605. [Google Scholar] [CrossRef]
- Kostyuchenko, E.Y.; Viktorovich, I.; Renko, B.; Shelupanov, A.A. User Identification by the Free-Text Keystroke Dynamics. In Proceedings of the 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC), Vladivostok, Russia, 18–25 August 2018; pp. 1–4. [Google Scholar]
- Nemati, A. Gender and Age Prediction Multilingual Author Profiles Based on Comment. FIRE 2018, 2266, 232–239. [Google Scholar]
- Nguyen, D.-P.; Trieschnigg, R.B.; Dogruoz, A.S.; Gravel, R.; Theune, M.; Meder, T.; De Jong, F. Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment. In Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, Dublin, Ireland, 23–29 August 2014; pp. 1950–1961. [Google Scholar]
- Peersman, C.; Walter, D.; Vaerenbergh, L. Predicting age and gender in online social networks. In Proceedings of the International Conference on Information and Knowledge Management, Glasgow, UK, 24–28 October 2011; pp. 37–44. [Google Scholar]
- Daneshvar, S. User Modeling in Social Media: Gender and Age Detection. Ph.D. Thesis, University of Ottawa, Ottawa, ON, Canada, 2019. [Google Scholar]
- Tumanova, K.S. Algorithm for the Classification of Texts in Russian by Age and Gender of the Author. 2011. Available online: https://studylib.ru/doc/2366008/tumanova-kristina---text (accessed on 9 November 2020).
- Škrlj, B.; Martinc, M.; Kralj, J.; Lavrač, N.; Pollak, S. tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification. Comput. Speech Lang. 2020, 65, 101104. [Google Scholar] [CrossRef]
- Chen, J.; Cheng, L.; Yang, X.; Liang, J.; Quan, B.; Li, S. Joint Learning with both Classification and Regression Models for Age Prediction. J. Physics Conf. Ser. 2019, 1168, 032016. [Google Scholar] [CrossRef]
- Abdallah, E.E.; Alzghoul, J.R.; Alzghool, M. Age and Gender prediction in Open Domain Text. Procedia Comput. Sci. 2020, 170, 563–570. [Google Scholar] [CrossRef]
- Wang, L. Multi-Task Learning for Gender and Age Prediction on Chinese Microblog. In Proceedings of the International Conference on Computer Processing of Oriental Languages, Kunming, China, 2–6 December 2016; pp. 189–200. [Google Scholar]
- Ustalov, D.; Filchenkov, A.; Pivovarova, L.; Žižka, J. Artificial Intelligence and Natural Language. In Proceedings of the 6th Conference, AINL 2017, Saint Petersburg, Russia, 20–23 September 2017; pp. 170–177. [Google Scholar]
- Rothe, R.; Timofte, R.; Van Gool, L. DEX: Deep EXpectation of Apparent Age from a Single Image. In Proceedings of the IEEE International Conference on Computer Vision Workshops 2015, Santiago, Chile, 11–12 December 2015; pp. 252–257. [Google Scholar]
- Eidinger, E.; Enbar, R.; Hassner, T. Age and Gender Estimation of Unfiltered Faces. IEEE Trans. Inf. Forensics Secur. 2014, 10, 2170–2179. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017. [Google Scholar] [CrossRef]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
- Yang, T.; Huang, Y.; Lin, Y.; Hsiu, P.; Chuang, Y. SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 1078–1084. [Google Scholar]
- Chang, K.-Y.; Chen, C.-S. A Learning Framework for Age Rank Estimation Based on Face Images with Scattering Transform. IEEE Trans. Image Process. 2015, 24, 785–798. [Google Scholar] [CrossRef] [PubMed]
- Parkhi, O.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In Proceedings of the British Machine Vision Conference 2015, Swansea, UK, 7–10 September 2015; Volume 1, pp. 41.1–41.12. [Google Scholar]
- Huang, G.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments; Technical Report; University of Massachusetts: Amherst, MA, USA, 2007. [Google Scholar]
- Wolf, L.; Hassner, T.; Maoz, I. Face Recognition in Unconstrained Videos with Matched Background Similarity. In Proceedings of the CVPR 2011, Providence, RI, USA, 20–25 June 2011; pp. 529–534. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
- Sun, Y.; Chen, Y.; Wang, X.; Tang, X. Deep Learning Face Representation by Joint Identification-Verification. In Proceedings of the NIPS 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 1988–1996. [Google Scholar]
- Sun, Y.; Liang, D.; Wang, X.; Tang, X. Deepid3: Face recognition with very deep neural networks. arXiv 2015, arXiv:1502.00873. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Liu, J.; Deng, Y.; Bai, T.; Wei, Z.; Huang, C. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv 2015, arXiv:1506.07310. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Wu, X.; He, R.; Sun, Z.; Tan, T. A light CNN for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2884–2896. [Google Scholar] [CrossRef] [Green Version]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A Discriminative Feature Learning Approach for Deep Face Recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 499–515. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-Margin Softmax Loss for Convolutional Neural Networks. In Proceedings of the ICML 2016, New York, NY, USA, 19–24 June 2016; Volume 2, p. 7. [Google Scholar]
- Zhang, X.; Fang, Z.; Wen, Y.; Li, Z.; Qiao, Y. Range Loss for Deep Face Recognition with Long-Tailed Training Data. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 5409–5418. [Google Scholar]
- Ranjan, R.; Castillo, C.D.; Chellappa, R. L2-Constrained Softmax Loss for Discriminative Face Verification. arXiv 2017, arXiv:1703.09507. [Google Scholar]
- Wang, F.; Xiang, X.; Cheng, J.; Yuille, A.L. Normface: L2 Hypersphere Embedding for Face Verification. In Proceedings of the 25th ACM international Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1041–1049. [Google Scholar]
- Liu, Y.; Li, H.; Wang, X. Rethinking feature discrimination and polymerization for large-scale recognition. arXiv 2017, arXiv:1710.00870. [Google Scholar]
- Hasnat, M.; Bohne, J.; Milgram, J.; Gentric, S.; Chen, L. Von mises-fisher mixture model-based deep learning: Application to face verification. arXiv 2017, arXiv:1706.04264. [Google Scholar]
- Deng, J.; Zhou, Y.; Zafeiriou, S. Marginal Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 60–68. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep Hypersphere Embedding for Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Qi, X.; Zhang, L. Face recognition via centralized coordinate learning. arXiv 2018, arXiv:1801.05678. [Google Scholar]
- Wang, F.; Cheng, J.; Liu, W.; Liu, H. Additive Margin Softmax for Face Verification. IEEE Signal Process. Lett. 2018, 25, 926–930. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J. Cosface: Large Margin Cosine Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5265–5274. [Google Scholar]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
- Zheng, Y.; Pal, D.K.; Savvides, M. Ring Loss: Convex Feature Normalization for Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5089–5097. [Google Scholar]
- Sobolev, A.A.; Kurtukova, A.V.; Romanov, A.S.; Vasilieva, M.I. Electronic Instrumentation and Control Systems. Determination of the Age of the Author of an Anonymous Text. In Proceedings of the XV International Scientific and Practical Conference 2019, Kyiv, Ukraine, 24–25 October 2019; Volume 2, pp. 128–131. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the 29 AAAI Conference on Artificial Intelligence 2015, Austin, TX, USA, 25–29 January 2015; pp. 2267–2273. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Lample, G.; Conneau, A. Cross-lingual Language Model Pretraining. arXiv 2019, arXiv:1901.07291. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual Representation Learning at Scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
- Demo Versions of a Computer Program for Diagnosing the Gender and Age of a Participant in Internet Communication Based on the Quantitative Parameters of His Texts. Available online: https://github.com/sag111/author_gender_and_age_profiling_with_style_imitation_detection (accessed on 9 November 2020).
Method | Public Time | Corpus | Accuracy ± Std (%) |
---|---|---|---|
DeepFace [29] | 2014 | Facebook (4.4M, 4K) | 97.35 ± 0.25 |
DeepID2 [30] | 2014 | CelebFaces+ (0.2M, 10K) | 99.15 ± 0.13 |
DeepID3 [31] | 2015 | CelebFaces+ (0.2M, 10K) | 99.53 ± 0.10 |
FaceNet [32] | 2015 | CelebFaces+ (0.2M, 10K) | 99.63 ± 0.09 |
Baidu [33] | 2015 | Baidu (1.2M, 18K) | 99.77 |
VGGface [34] | 2015 | VGGface (2.6M, 2.6K) | 98.95 |
light-CNN [35] | 2015 | MS-Celeb-1M (8.4M, 100K) | 98.8 |
Center Loss [36] | 2016 | CASIA-WebFace, CACD2000, Celebrity+ (0.7M, 17K) | 99.28 |
L-softmax [37] | 2016 | CASIA-WebFace (0.49M, 10K) | 98.71 |
Range Loss [38] | 2016 | MS-Celeb-1M, CASIA-WebFace (5M, 100K) | 99.52 |
L2-softmax [39] | 2017 | MS-Celeb-1M (3.7M, 58K) | 99.78 |
Normface [40] | 2017 | CASIA-WebFace (0.49M, 10K) | 99.19 |
CoCo loss [41] | 2017 | MS-Celeb-1M (3M, 80K) | 99.86 |
vMF loss [42] | 2017 | MS-Celeb-1M (4.6M, 60K) | 99.58 |
Marginal Loss [43] | 2017 | MS-Celeb-1M (4M, 80K) | 99.48 |
SphereFace [44] | 2017 | CASIA-WebFace (0.49M, 10K) | 99.42 |
CCL [45] | 2018 | CASIA-WebFace (0.49M, 10K) | 99.12 |
AMS loss [46] | 2018 | CASIA-WebFace (0.49M, 10K) | 99.12 |
Cosface [47] | 2018 | CASIA-WebFace (0.49M, 10K) | 99.33 |
Arcface [48] | 2018 | MS-Celeb-1M (3.8M, 85K) | 99.83 |
Ring loss [49] | 2018 | MS-Celeb-1M (3.5M, 31K) | 99.50 |
MLP [15] | 2018 | 350 texts | 67.18 |
SVM [17] | 2011 | Flemish Dutch Netlog posts (1.5M) | 71.3, 80.8, 88.2 |
CNN [18] | 2019 | Twitter (11K posts) | 82.21 |
SVM, Bayesian NN [19] | 2011 | Russian-language texts (100) | 50.1, 48.2 |
SVM [21] | 2014 | Labeled Faces in the Wild (LFW) (13K) | 70 |
VGG-Very-Deep-16 [26] | 2015 | 2.6M images | 98 |
Categories | Accuracy for Two Categories (%) | Accuracy for Three Categories (%) | Average Epoch Time | Total Training Time | ||
---|---|---|---|---|---|---|
Models | Raw Data | Filtered Data | Raw Data | Filtered Data | ||
FastText | 66.5 | 82.1 | 44.5 | 62.4 | 60 s | 3600 s |
CRNN | 63.8 | 81.8 | 43 | 61.1 | 114 s | 5700 s |
BERT | 65.8 | 80.3 | 41.1 | 60.4 | 540 s | 27,000 s |
XLM | 49.8 | 50.1 | 38 | 46.2 | 660 s | 33,000 s |
RoBERTa | 50.3 | 75.3 | 40.7 | 60.1 | 530 s | 26,500 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Romanov, A.S.; Kurtukova, A.V.; Sobolev, A.A.; Shelupanov, A.A.; Fedotova, A.M. Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information 2020, 11, 589. https://doi.org/10.3390/info11120589
Romanov AS, Kurtukova AV, Sobolev AA, Shelupanov AA, Fedotova AM. Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information. 2020; 11(12):589. https://doi.org/10.3390/info11120589
Chicago/Turabian StyleRomanov, Aleksandr Sergeevich, Anna Vladimirovna Kurtukova, Artem Alexandrovich Sobolev, Alexander Alexandrovich Shelupanov, and Anastasia Mikhailovna Fedotova. 2020. "Determining the Age of the Author of the Text Based on Deep Neural Network Models" Information 11, no. 12: 589. https://doi.org/10.3390/info11120589
APA StyleRomanov, A. S., Kurtukova, A. V., Sobolev, A. A., Shelupanov, A. A., & Fedotova, A. M. (2020). Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information, 11(12), 589. https://doi.org/10.3390/info11120589