NEURAL NETWORK SEARCH AND CLASSIFICATION OF CYBERBULLYING IN TEXT MESSAGES
DOI:
https://doi.org/10.32782/IT/2024-4-23Keywords:
cyberbullying, neural networks, interpretation of results, BERT, LIME.Abstract
The article highlights the problem of searching and classifying cyberbullying in text messages, which is one of the key challenges of the modern information society. The relevance of the study is due to the need to create effective systems capable of ensuring accurate, ethical and transparent neural network detection of cyberbullying. Particular importance is given to adapting such systems to sensitive topics such as discrimination on the basis of age, ethnicity, gender and religion. The purpose of the work is to create a comprehensive method for neural network search and classification of cyberbullying in text messages, which involves ensuring the representativeness of the data in the dataset used to train the model, adhering to the ethical principle of fairness in the development of the model and the ability to interpret the results of the model regarding the types of detected cyberbullying. The novelty of the proposed approach lies in the creation of a new method that allows not only to assess the presence of cyberbullying in text messages, but also to determine with high accuracy the manifestation of each type of cyberbullying, ensuring the formation of representatively balanced datasets for training neural network models, which is performed in three stages. At the first stage, the representativeness of the dataset for training neural network models for the task of detecting and classifying cyberbullying is assessed. In particular, the method allows minimizing deviations in the distribution of data by classes, which reaches a maximum of only 0.04%. At the second stage, neural network classification models are used: BiLSTM for binary classification of cyberbullying, which demonstrates an accuracy of 96%, and BERT for multi-label classification by different types of cyberbullying with an accuracy of 94%. The third stage involves the application of the LIME model, which provides a visual interpretation of the neural network solutions, allowing users to obtain an explanation for each detected type of cyberbullying. The research methodology is based on a combination of modern approaches to machine learning, qualitative analysis of data representativeness and the use of interpretive models. The integration of these approaches is aimed at creating transparent and trusted cyberbullying detection systems that can be applied in real-world conditions. The results demonstrate the effectiveness of the proposed method, which not only increases the accuracy and transparency of the cyberbullying detection and classification process, but also meets Sustainable Development Goals No. 5, No. 10 and No. 16, which allows the proposed comprehensive method to be relevant for use in systems where ethics and accuracy are important.
References
Teng T. H., Varathan, K. D. Cyberbullying detection in social networks: A comparison between machine learning and transfer learning approaches. IEEE Access, vol. 11, 2023. С. 55533–55560.
Unnava S., Parasana S. R. A Study of Cyberbullying Detection and Classification Techniques: A Machine Learning Approach. Engineering, Technology & Applied Science Research, 14(4), 2024. P. 15607–15613.
Pagano T. P., Loureiro R. B., Lisboa F.V.N., Peixoto R. M., Guimarães G.A.S., Cruz G.O.R., Araujo M. M., Santos L. L., Cruz M.A.S., Oliveira E.L.S. Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data Cogn. Comput., 7(1), 2023. P. 15.
Собко О. В. Метод інтелектуального пошуку кіберзалякувань у текстовому контенті. Розвитки інформаційно-керуючих систем та технологій: монографія. Львів-Торунь: Lina-Pres, 2024. С. 267–287.
Krak I., Zalutska O., Molchanova M., Mazurets O., Bahrii R., Sobko O., Barmak O. Abusive Speech Detection Method for Ukrainian Language Used Recurrent Neural Network. CEUR Workshop Proceedings. Vol. 3688, 2024. С. 16–28.
Harish D., Alamelu M., Manimaran M. Automatic Detection of Cyberbullying on Social Media Using Machine Learning. In 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), 2023. С. 1–6.
Orrù G., Galli A., Gattulli V., Gravina M., Micheletto M., Marrone S., Sansone C. Development of Technologies for the Detection of (Cyber) Bullying Actions: The BullyBuster Project. Information, 14(8), 430, 2023.
Samee N. A., Khan U., Khan S., Jamjoom M. M., Sharif M., Kim D. H. Safeguarding Online Spaces: A Powerful Fusion of Federated Learning, Word Embeddings, and Emotional Features for Cyberbullying Detection. IEEE Access, vol. 11, 2023. С. 124524–124541.
Kaggle.com. Cyberbullying Classification, 2021. URL: https://www.kaggle.com/datasets/andrewmvd/ cyberbullying-classification?resource=download. Дата останнього звернення: 2024/12/02.
Kaggle.com. CyberBullying Detection Dataset, 2024. URL: https://www.kaggle.com/datasets/sayankr007/cyber-bullying-data-for-multi-label-classification. Дата останнього звернення: 2024/12/02.
Kaggle.com. Tweet Files for Gender Guessing, 2019. URL: https://www.kaggle.com/datasets/aharless/tweet-files-for-gender-guessing. Дата останнього звернення: 2024/12/02.
Live.european-language-grid.eu. TAG-it Dataset Distribution, 2024. URL: https://live.europeanlanguage-grid.eu/catalogue/corpus/8112/download. Дата останнього звернення: 2024/12/02.
Cyberbullying Tweets. URL: https://www.kaggle.com/datasets/soorajtomar/cyberbullying-tweets. Last accessed: 2024/10/27. Дата останнього звернення: 2024/12/02.
Idss.org.ua. Національні демографічні прогнози 2023. URL: https://idss.org.ua/forecasts/nation_pop_proj. Дата останнього звернення: 2024/12/02.