Dataset Augmentation for Counteracting Bias in Toxic Comment Classification

Senhao Cheng

doi:10.54097/ex94ex07

Authors

Senhao Cheng

DOI:

https://doi.org/10.54097/ex94ex07

Keywords:

Toxic Comments, Bias, CTRL, Machine Learning.

Abstract

Toxic comments are a prevalent issue on online social media and networking platforms. These comments contain offensive, malicious, hate speech, or other harmful content that negatively impacts audiences and communities. Effectively detecting and categorizing toxic comments is essential for maintaining order in the online environ ment, protecting user safety, and enhancing user experience. This is despite the fact that researchers and companies have developed various models to recognize toxicity in online chats and comments, achieving some success. However, many of the currently used models incorrectly classify non-toxic comments that contain certain identity terms as potentially toxic. This misclassification hinders the ability to accurately identify categorized comments. In this paper, the detection and classification of toxic comments were implemented using Term Frequency-Inverse Document Frequency (TF-IDF) and machine learning techniques. Additionally, two dataset-specific optimizations were proposed to mitigate the impact of bias on text classification by expanding the number of datasets. Comparative analysis of bias evaluation metrics demonstrates that this approach can effectively mitigate bias while maintaining the accuracy of the original model as much as possible.

Downloads

Download data is not yet available.

References

Gorwa R, Binns R, Katzenbach C. Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society, 2020, 7 (1): 2053951719897945.

Georgakopoulos S V, Tasoulis S K, Vrahatis A G, et al. Convolutional neural networks for toxic comment classification. Proceedings of the 10th hellenic conference on artificial intelligence. 2018: 1 - 6.

Liu Y, Han T, Ma S, et al. Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology, 2023: 100017.

Kohavi R, Wolpert D H. Bias plus variance decomposition for zero-one loss functions. ICML. 1996, 96: 275 - 283.

Kaggle, Jigsaw toxic comment classification challenge, 2017, https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.

Vaidya A, Mai F, Ning Y. Empirical analysis of multi-task learning for reducing identity bias in toxic comment detection. Proceedings of the International AAAI Conference on Web and social media. 2020, 14: 683 - 693.

Ramos J. Using tf-idf to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning. 2003, 242 (1): 29 - 48.

Pal M. Random Forest classifier for remote sensing classification. International journal of remote sensing, 2005, 26 (1): 217 - 222.

Awad M, Khanna R, Awad M, et al. Support vector machines for classification. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers, 2015: 39 - 66.

Rennie J D, Shih L, Teevan J, et al. Tackling the poor assumptions of naive bayes text classifiers. Proceedings of the 20th international conference on machine learning (ICML-03). 2003: 616 - 623.

Genkin A, Lewis D D, Madigan D. Large-scale Bayesian logistic regression for text categorization. technometrics, 2007, 49 (3): 291 - 304.

Keskar N S, McCann B, Varshney L R, et al. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv: 1909. 05858, 2019.

Raza S, Reji D J, Ding C. Dbias: detecting biases and ensuring fairness in news articles. International Journal of Data Science and Analytics, 2022: 1 - 21.

Dataset Augmentation for Counteracting Bias in Toxic Comment Classification

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications