Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network
Abstract
:1. Introduction
2. Literature Review
2.1. Cyberbullying
2.2. Machine Learning
2.3. Deep Neural Networks
2.4. Hierarchical Neural Networks
3. Methodology
3.1. Data Collection
3.2. Cyberbullying Annotation
- (1).
- Preliminary filtration: Following cyberbullying characteristics, repetitive content from numerous users was required. A set of dialogues was established with corresponding response comments. Each dialogue required at least twenty corresponding comments in general cases.
- (2).
- Cyberbullying dialogue samples filtration: The cyberbullying dialogue was filtered using the manual method because the goal of this paper is cyberbullying severity.
- (3).
- Feature scoring: Five experts annotated all the cyberbullying dialogues independently following the labeling criteria. Intent, repetition, and aggression toward the victim were the criteria for determining cyberbullying severity. Table 1 summarizes the labeling standards in detail. Each criterion was scored on a scale of 1 (strongly disagree) to 3 (strongly agree). Thus, each dialogue could obtain three possible scores for determining severity level.
- (4).
- Cyberbullying severity labeling: Cyberbullying dialogues were labeled as described in the following text. If more than three experts provided the same score, the mode value was used. Furthermore, to remove outlier scores, the scores provided by the five experts were averaged. Finally, dialogues with scores of 3, 4, and 5 were labeled as “slight,” dialogues with scores of 6 or 7 were labeled as “medium,” and dialogues with scores of 8 or 9 were labeled as “serious.”
Features | Labeling Rule |
---|---|
Intentional | Whether the content is targeted to a specific individual or a group, if so, further determines the number of people covered by the victimizer. The small number belongs to serious, and the larger number belongs to slight. |
Repeated | The proportion of people attacking the same person in the message, the larger the proportion, the higher the repetition, and the degree of cyberbullying is relatively serious. |
Aggressive | The content contains personal offensive remarks, belittle others’ personality, body, life, character, etc., or offensive comments on specific characteristics or sexual orientation. |
3.3. Text Preprocessing
3.3.1. Text Cleaning
3.3.2. Sentence and Word Segmentation
3.3.3. Text Sequence Generation to Input Data
3.4. HSAN Model Building
3.4.1. Word Encoder
3.4.2. Word SAM
3.4.3. Sentence Encoder
3.4.4. Sentence SAM
3.4.5. Classification of Cyberbullying Severity
4. Experimental Results
4.1. Experimental Data
4.2. Classification Models
- (1).
- HSAN: Proposed model. This model has a hierarchical structure and uses a bidirectional GRU and squashing-attention network to encode text into hidden features. It uses the SoftMax function for classification.
- (2).
- GRU: This model uses a bidirectional GRU to encode text into hidden features. It has a nonhierarchical structure and uses the SoftMax function for classification.
- (3).
- GRU(Soft): This model is similar to the GRU model but includes a soft-attention network.
- (4).
- GRU(Squashing): This model is similar to the GRU model but includes a squashing-attention network.
- (5).
- HAN [13]: This model is a baseline model with a hierarchical structure.
- (6).
- SVM [46]: This baseline model uses the SVM algorithm. It uses TFIDF to transform documents into vectors.
- (7).
- RandomForest [47]: This baseline model uses the random forest algorithm. It uses TFIDF to transform documents into vectors
- (8).
- TextCNN [38]: This model uses word2vec pretraining word embeddings as input and multiple CNN kernels with max-pooling to extract multi-level features from the text.
- (9).
- TextRCNN [39]: This model uses bidirectional LSTM with max-pooling to extract critical features.
- (10).
- DPCNN [48]: This model is known as the deep pyramid CNN. It contains a word-level CNN with max pooling. The amount of collected data is halved to form a pyramid structure.
- (11).
- Transformer [49]: In this model, an encoder and a decoder are constructed through multilayer stacking and used with the self-attention mechanism.
- (12).
- CapsuleNet [45]: This model comprises neurons. In CapsuleNet, dynamic routing of the last layer of category capsules is conducted. A new squashing function is used to compress the vector length of the category, which represents the final category probability.
- (13).
4.3. Experimental Design
4.4. Evaluation Metrics
4.5. Classification Performance of the HSAN
4.6. Comparison of Learning Rates and Hidden Sizes
4.7. Evaluation of Hierarchical Structure Versus Non-Hierarchical Structure
4.8. Comparison of the HSAN with Other Models
4.9. Comparison of the HSAN with Other Models on Other Public Datasets
4.10. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bastiaensens, S.; Vandebosch, H.; Poels, K.; Van Cleemput, K.; Desmet, A.; De Bourdeaudhuij, I. ‘Can I afford to help?’ How affordances of communication modalities guide bystanders’ helping intentions towards harassment on social network sites. Behav. Inf. Technol. 2013, 34, 425–435. [Google Scholar] [CrossRef]
- Gulzar, M.A.; Ahmad, M.; Hassan, M.; Rasheed, M.I. How social media use is related to student engagement and creativity: Investigating through the lens of intrinsic motivation. Behav. Inf. Technol. 2021, 40, 1–11. [Google Scholar] [CrossRef]
- Ioannou, A.; Blackburn, J.; Stringhini, G.; De Cristofaro, E.; Kourtellis, N.; Sirivianos, M. From risk factors to detection and intervention: A practical proposal for future work on cyberbullying. Behav. Inf. Technol. 2018, 37, 258–266. [Google Scholar] [CrossRef]
- Lin, L.Y.; Sidani, J.E.; Shensa, A.; Radovic, A.; Miller, E.; Colditz, J.B.; Hoffman, B.L.; Giles, L.M.; Primack, B.A. Association between social media use and depression among U.S. young adults. Depress. Anxiety 2016, 33, 323–331. [Google Scholar] [CrossRef] [PubMed]
- Son, J.-E.; Lee, S.-H.; Cho, E.-Y.; Kim, H.-W. Examining online citizenship behaviours in social network sites: A social capital perspective. Behav. Inf. Technol. 2016, 35, 730–747. [Google Scholar] [CrossRef]
- Al-Ajlan, M.A.; Ykhlef, M. Optimized Twitter cyberbullying detection based on deep learning. In Proceedings of the 21st Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, 25–26 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Agrawal, S.; Awekar, A. Deep learning for detecting cyberbullying across multiple social media platforms. In Advances in Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2018; pp. 141–153. [Google Scholar] [CrossRef] [Green Version]
- Rosa, H.; Matos, D.M.; Ribeiro, R.; Coheur, L.; Carvalho, J.P. A “Deeper” look at detecting cyberbullying in social networks. In Proceedings of the 2018 International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Whittaker, E.; Kowalski, R.M. Cyberbullying via social media. J. School Violence 2015, 14, 11–29. [Google Scholar] [CrossRef]
- Lowry, P.; Zhang, J.; Wang, C.; Siponen, M. Why Do Adults Engage in Cyberbullying on Social Media? An Integration of Online Disinhibition and Deindividuation Effects with the Social Structure and Social Learning Model. Inf. Syst. Res. 2016, 27, 962–986. [Google Scholar] [CrossRef] [Green Version]
- Chapin, J. Adolescents and Cyber Bullying: The Precaution Adoption Process Model. Educ. Inf. Technol. 2014, 21, 719–728. [Google Scholar] [CrossRef]
- Hood, M.; Duffy, A.L. Understanding the relationship between cyber-victimisation and cyber-bullying on social network sites: The role of moderating factors. Personal. Individ. Differ. 2018, 133, 103–108. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar] [CrossRef] [Green Version]
- Li, Q. New bottle but old wine: A research of cyberbullying in schools. Comput. Hum. Behav. 2007, 23, 1777–1791. [Google Scholar] [CrossRef]
- Selkie, E.M.; Kota, R.; Chan, Y.-F.; Moreno, M. Cyberbullying, Depression, and Problem Alcohol Use in Female College Students: A Multisite Study. Cyberpsychol. Behav. Soc. Netw. 2015, 18, 79–86. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Patchin, J.W.; Hinduja, S. Bullies Move Beyond the Schoolyard. Youth Violence Juv. Justice 2006, 4, 148–169. [Google Scholar] [CrossRef]
- Mason, K.L. Cyberbullying: A preliminary assessment for school personnel. Psychol. Sch. 2008, 45, 323–348. [Google Scholar] [CrossRef]
- Snakenborg, J.; Van Acker, R.; Gable, R.A. Cyberbullying: Prevention and Intervention to Protect Our Children and Youth. Prev. Sch. Fail. Altern. Educ. Child. Youth 2011, 55, 88–95. [Google Scholar] [CrossRef]
- Dadvar, M.; Trieschnigg, D.; Ordelman, R.; Jong, F.D. Improving cyberbullying detection with user context. In Lecture Notes in Computer Science Advances in Information Retrieval; Springer: Cham, Switzerland, 2013; pp. 693–696. [Google Scholar] [CrossRef] [Green Version]
- Kasim, H.; Riadi, I. Detection of cyberbullying on social media using data mining techniques. Int. J. Comput. Sci. Inf. Secur. 2017, 15, 244–250. [Google Scholar]
- Chatzakou, D.; Kourtellis, N.; Blackburn, J.; Cristofaro, E.D.; Stringhini, G.; Vakali, A. Mean birds: Detecting aggression and bullying on Twitter. In Proceedings of the 2017 International ACM Web Science Conference, Troy, NY, USA, 25–28 June 2017; pp. 13–22. [Google Scholar] [CrossRef] [Green Version]
- Hinduja, S.; Patchin, J.W. Bullying, Cyberbullying, and Suicide. Arch. Suicide Res. 2010, 14, 206–221. [Google Scholar] [CrossRef]
- Qin, P.; Xu, W.; Guo, J. A novel negative sampling based on TFIDF for learning word representation. Neurocomputing 2015, 177, 257–265. [Google Scholar] [CrossRef]
- Reynolds, K.; Kontostathis, A.; Edwards, L. Using machine learning to detect cyberbullying. In Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops, Honolulu, HI, USA, 18–21 December 2011; pp. 241–244. [Google Scholar] [CrossRef] [Green Version]
- Dinakar, K.; Reichart, R.; Lieberman, H. Modeling the detection of textual cyberbullying. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011; pp. 11–17. [Google Scholar]
- Huang, Q.; Singh, V.K.; Atrey, P.K. Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia, Orlando, FL, USA, 7 November 2014; pp. 3–6. [Google Scholar] [CrossRef]
- Zhao, R.; Zhou, A.; Mao, K. Automatic detection of cyberbullying on social networks based on bullying features. In Proceedings of the 17th International Conference on Distributed Computing and Networking, Singapore, 4–7 January 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Gutiérrez-Esparza, G.O.; Vallejo-Allende, M.; Hernández-Torruco, J. Classification of Cyber-Aggression Cases Applying Machine Learning. Appl. Sci. 2019, 9, 1828. [Google Scholar] [CrossRef] [Green Version]
- Cheng, L.; Li, J.; Silva, Y.N.; Hall, D.L.; Liu, H. XBully: Cyberbullying detection within a multi-modal context. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 339–347. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, S.; Zhang, S.; Qiao, T.; Cao, S. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 2020, 453, 896–903. [Google Scholar] [CrossRef]
- Liu, F.; Zheng, L.; Zheng, J. HieNN-DWE: A hierarchical neural network with dynamic word embeddings for document level sentiment classification. Neurocomputing 2020, 403, 21–32. [Google Scholar] [CrossRef]
- Jin, X.; Lan, C.; Zeng, W.; Zhang, Z.; Chen, Z. CASINet: Content-Adaptive Scale Interaction Networks for scene parsing. Neurocomputing 2020, 419, 9–22. [Google Scholar] [CrossRef]
- Tian, D.; Li, M.; Shi, J.; Shen, Y.; Han, S. On-site text classification and knowledge mining for large-scale projects construction by integrated intelligent approach. Adv. Eng. Informatics 2021, 49, 101355. [Google Scholar] [CrossRef]
- Fang, W.; Luo, H.; Xu, S.; Love, P.E.; Lu, Z.; Ye, C. Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv. Eng. Informatics 2020, 44, 101060. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 12 December 2014. [Google Scholar]
- Menini, S.; Moretti, G.; Corazza, M.; Cabrio, E.; Tonelli, S.; Villata, S. A system to monitor cyberbullying based on message classification and social network analysis. In Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, 1 August 2019; pp. 105–110. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2267–2273. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 North American Chapter of the Association for Computational Linguistics-Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv preprint 2020, arXiv:abs/2004.05150. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint 2019, arXiv:abs/1910.01108. [Google Scholar]
- Gao, S.; Ramanathan, A.; Tourassi, G. Hierarchical convolutional attention networks for text classification. In Proceedings of the Third Workshop on Representation Learning for NLP, Melbourne, Australia, 20 July 2018; pp. 11–23. [Google Scholar] [CrossRef] [Green Version]
- Cheng, L.; Guo, R.; Silva, Y.; Hall, D.; Liu, H. Hierarchical attention networks for cyberbullying detection on the Instagram social network. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 235–243. [Google Scholar] [CrossRef] [Green Version]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3856–3866. [Google Scholar]
- Boser, B.E.; Guyon, I.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar] [CrossRef]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef] [Green Version]
- Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar] [CrossRef] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. In Proceedings of the 2015 Advances in Neural Information Processing Systems, Montreal, QC, USA, 7–12 December 2015; pp. 649–658. [Google Scholar]
Annotator | A | B | C | D | E | Average |
---|---|---|---|---|---|---|
MAE | 0.51 | 0.53 | 0.48 | 0.60 | 0.48 | 0.52 |
Type | Number | Mean | Mode | Maximum | Minimum |
---|---|---|---|---|---|
Title | Word | 10 | 10 | 21 | 1 |
Content | Sentence | 64 | 100 | 1193 | 3 |
word | 5 | 1 | 54 | 1 | |
Comment | Sentence | 150 | 24 | 1500 | 20 |
word | 7 | 1 | 28 | 1 |
Data Sets | No. of Samples | Severity | ||
---|---|---|---|---|
Slight | Medium | Serious | ||
Training set | 3500 | 1190 (34%) | 870 (25%) | 1440 (41%) |
Validation set | 500 | 201 (40%) | 241 (48%) | 58 (12%) |
Test set | 1000 | 447 (45%) | 459 (46%) | 94 (9%) |
Total | 5000 | 1838 (100%) | 1570 (100%) | 1592 (100%) |
Model | Parameter | Value |
---|---|---|
HSAN, HAN, GRU, GRU(Soft), GRU(Squashing), TextCNN, TextRCNN, DPCNN | Learning rate | 1 × 10−3, 1 × 10−4, 1 × 10−5 |
Epoch | 50 | |
Hidden and embedding size | 300, 500 | |
Dropout rate | 0.1 | |
SVM | Kernel | RBF, linear, poly, sigmoid |
Gamma | 0.3, 0.5, 0.7, 0.9, 1 | |
Cost | 1 | |
Random Forest | Number of estimator | 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 |
Text CNN | filter sizes | (2, 3, 4) |
number of channels | 300 | |
DPCNN | filter size | 3, 4, 5 |
number of channels | 300 | |
CapsuleNet | routing | 2, 5, 10 |
BERT | Learning rate Pretrained model | 1 × 10−4, 1 × 10−5, 1 × 10−6 bert-base-chinese, distilbert-base-multilingual-cased and allenai/longformer-base-4096 |
Label | Validation Set | Test Set | ||||
---|---|---|---|---|---|---|
Slight | 63.43% | 68.16% | 65.71% | 65.60% | 73.38% | 69.27% |
Medium | 62.55% | 63.07% | 62.81% | 59.42% | 57.73% | 58.56% |
Serious | 68.29% | 48.28% | 56.57% | 55.56% | 31.91% | 40.54% |
Average macro | 64.76% | 59.84% | 61.70% | 60.19% | 54.34% | 56.12% |
Average weighted | 63.56% | 63.40% | 63.22% | 61.82% | 62.30% | 61.76% |
Overall | 64.16% | 61.62% | 62.46% | 61.01% | 58.32% | 58.94% |
Model | Hierarchical | Attention | |||
---|---|---|---|---|---|
GRU | No | No | 30.94% | 40.89% | 35.92% |
GRU(Soft) | No | Soft | 53.50% | 58.92% | 56.21% |
GRU(Squashing) | No | Squashing | 61.36% | 62.70% | 62.03% |
HAN | Yes | Soft | 43.56% | 57.62% | 50.59% |
HSAN | Yes | Squashing | 61.70% | 63.22% | 62.46% |
Model | ACC | |||
---|---|---|---|---|
SVM | 54.50% | 52.89% | 52.10% | 52.50% |
RandomForest | 46.40% | 47.61% | 38.68% | 43.15% |
TextCNN | 60.00% | 58.39% | 58.75% | 58.57% |
TextRCNN | 58.60% | 58.60% | 58.46% | 58.53% |
DPCNN | 60.30% | 57.16% | 58.98% | 58.07% |
Transformer | 55.40% | 47.34% | 53.80% | 50.57% |
CapsuleNet | 45.25% | 30.60% | 41.47% | 36.04% |
BERT (bert-base-chinese) | 57.40% | 53.21% | 51.48% | 52.35% |
BERT (distilbert-base-multilingual-cased) | 55.80% | 53.43% | 54.58% | 54.00% |
BERT (allenai/longformer-base-4096) | 49.00% | 43.94% | 45.03% | 44.49% |
HAN | 58.70% | 54.15% | 55.77% | 54.96% |
HSAN | 62.30% | 56.12% | 61.76% | 58.94% |
Dataset | HAN | HSAN | ||||||
---|---|---|---|---|---|---|---|---|
ACC | ACC | |||||||
IMDB Review | 65.23% | 65.47% | 65.35% | 65.09% | 66.54% | 66.60% | 66.54% | 66.57% |
Yahoo Answer | 65.98% | 65.87% | 65.92% | 65.72% | 66.03% | 65.84% | 66.03% | 65.93% |
Yelp Review Polarity | 83.60% | 83.77% | 83.68% | 83.57% | 84.92% | 85.03% | 84.92% | 84.97% |
Amazon Review Polarity | 70.83% | 74.53% | 72.63% | 73.57% | 77.65% | 78.62% | 77.65% | 78.13% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.-L.; Tang, C.-Y. Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network. Appl. Sci. 2022, 12, 3502. https://doi.org/10.3390/app12073502
Wu J-L, Tang C-Y. Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network. Applied Sciences. 2022; 12(7):3502. https://doi.org/10.3390/app12073502
Chicago/Turabian StyleWu, Jheng-Long, and Chiao-Yu Tang. 2022. "Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network" Applied Sciences 12, no. 7: 3502. https://doi.org/10.3390/app12073502
APA StyleWu, J. -L., & Tang, C. -Y. (2022). Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network. Applied Sciences, 12(7), 3502. https://doi.org/10.3390/app12073502