Beruflich Dokumente
Kultur Dokumente
Daryll James Carrasco,1 Melvin D. Macapinlac,2 Nigel A. Silva3, and Paul Michael B. Uy4
Abstract
Social media sites have become an avenue for some users to spread hate speech and toxic comments. Others are
harassed and become victims of cyberbullying. The goal of this project is to use deep learning methods to identify
toxicity in online comments, which can be used to filter potentially harmful messages, promote a healthier and
safer social media space for everyone especially for the kids, and to help users craft more civil arguments when
engaging in an online discussion. This project will be implementing various deep learning models and evaluate its
performance on a binary classification task – whether a given text is toxic or not toxic. Using a 4-layer neural
network model, the team was able to develop an architecture than can accurately predict up to 91.01% accuracy
whether a given text is toxic or not.
Keywords: cyberbullying, toxic comments, neural network, machine learning, deep learning, classification
Introduction
Discussion in various social media platforms often lead to harassment and abuse due to anonymity
of users. Users have found it now much easier to spread hate, harmful and toxic comments in the
comments section of different social media sites. These sites unfortunately have become a medium for
bullies, and these can be very harmful especially for users who can become victims of online harassment
and cyberbullying. With the help of modern computing tools, language patterns used by bullies can be
identified, and rules can be developed to automatically identify cyberbullying contents.
The goal of this project is to use deep learning methods to identify toxicity in online comments,
which can be used to filter potentially harmful messages, promote a healthier and safer social media space
for everyone especially for the kids, and to help users craft more civil arguments when engaging in an
online discussion. This project will be implementing various deep learning models and evaluate its
performance on a binary classification task – whether a given text is toxic or not toxic.
Hosseini et. al, proposed an attack on Google’s Perspective API that detects toxic comments.
Their study shows that toxicity scores can be reduced to the level of non-toxic comments which questions
the usability of the said API (Hosseini et al., 2017).
1
A different approach was done by the team of Srivastava. They used capsule network to identify
aggression and toxicity in the comments. In their work, they demonstrated that the problem of the balance
of the data can also be addressed using their methodology (Srivastava et al, 2018). Georgakopoulos and his
team tried a similar approach but with the use of Convolutional Neural Networks (CNN) (Georgakopoulos
et al, 2018). They compared CNN to the traditional bag-of-words approach. Their results show that CNN
can enhance toxic comment classification. On the other hand, Pavlopoulos et. al used RNN for user
comment moderation and concluded that RNN outperforms the state-of-the-art methods (Pavlopoulos et.al,
2017).
For the purpose of this project, we will be using the toxic field as our indicator to identify whether a
given comment is toxic or not toxic.
Methodology
This section will discuss the steps done in this project: (1) data preprocessing, (2) modelling
and (3) model evaluation.
A. Data Preprocessing
Balancing of Data
The original dataset contains 144,277 entries of non-toxic comments and 15,294 entries of
toxic comments. For the toxic and non-toxic comments to have an equal representation in our model,
the dataset was first balanced. Balancing the dataset to a 50-50 toxic/non-toxic comments split by
random under-sampling resulted to a total of 30,588 entries of comments that will be used for training
and testing. We will be using a 90-10 split for our training and testing data.
B. Modelling
We used Keras to implement our neural network. The architecture is summarized below:
2
Figure 1. Architecture of our Neural Network Model
First, the comments were fed into the pre-trained word embeddings GloVe-50D. This layer
will give the vocabulary to our model by giving context to words with similar meanings. GloVe
means Global Vectors for Word Representations which is an unsupervised learning algorithm for
obtaining vector representations for words. Training is performed on aggregated global word-word
co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear
substructures of the word vector space.
The second layer is the bidirectional long short-term memory (LSTM) which gives the
semantics to our model by helping it understand the sequence of words before and after the token.
Bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input
sequence as-is and the second on a reversed copy of the input sequence. This can provide additional
context to the network and result in faster and even fuller learning on the problem. There are 50 nodes
used in this layer with return_sequences parameter set to True, dropout and recurrent_dropout
parameters set to 0.1. A MaxPool layer was also added which will reduce dimension, speed up run
time, and mitigate overfitting.
The third layer is a dense layer which has 50 nodes and uses a relu activation function. A
dense layer is a fully connected node which does a linear operation on the layer’s input vector. This
employs an additional dropout layer to regularize and further avoid overfitting.
Lastly, the final output layer uses a sigmoid activation function, which will classify whether a
given comment is toxic or not. Overall, the model yields 1,045,501 trainable parameters.
The table below summarizes the neural network model with the parameters used in this
project:
3
Table 1. Model Summary
C. Model Evaluation
For this classification task, the performance of the model will be tested according to test
accuracy, and precision and recall scores. These metrics will be obtained from the confusion matrix
obtained after prediction.
The main metric to be used is the accuracy, which is the ratio of the correctly predicted
observations to the total observations. Further, the precision and recall scores are an important
measure to further validate the model’s performance, particularly for imbalanced datasets. The
precision is the ratio of correctly predicted positive observations to the total predicted positive
observations. Recall is the ratio of correctly predicted positive observations to all the observations in
the actual positive class.
Results and Discussion
We planned to run the model through several epochs, however after the second run, the
algorithm determined that there were no improvements to the accuracy. As such the team decided that
the best weight generated from the first two epochs are enough to determine that the model is
materially trained. From the training we obtained an accuracy of 91.01%.
With this we can conclusively say that using the model created from the analysis 91.01% of the time
we can determine if the textual input is toxic in nature.
Given this the team decided to check the confusion matrix generated by the model from the use of the
dataset.
We obtained the following
Actual Yes Actual No
Predicted Yes Toxic Comments Labeled as Non-Comments Labeled as Toxic
Toxic False Positive
True Positive 734
14,560
Predicted No Toxic Comments Labeled as Non-Toxic Comments Labeled as
Non-Toxic Non-Toxic
False Negative True Negatives
867 14,427
4
We further explored our results and determined that we obtained a precision of 95.2% which
translates to the model being able to correctly identify that the post or comment is toxic from the
entire dataset.
Precision = Toxic Comments Labeled as Toxic/ Toxic Comments Labeled as Toxic + Non-Toxic
Comments Labeled as Toxic
Following this line of analysis, the model achieved a recall of 94.33% which translates to the model
being able to correctly identify that the post or comment is toxic from the pool of comments from the
dataset that is actually toxic in nature.
Recall = Toxic Comments Labeled as Toxic/ Toxic Comments Labeled as Toxic + Toxic Comments
Labeled as Non-Toxic
We know that if a dataset is balanced accuracy is good metric for the model validity; however, for
good measure the team decided to check the F1-score, which considers both the precision and recall
and attempts to get their harmonic mean. From this the team obtained a 94.76%
Acknowledgements
5
We would like to thank Prof. Christopher Monterola, Prof. Erika Legara, Prof. Christian Alis
and the rest of Asian Institute of Management’s Master of Science in Data Science faculty, data
scientists and staff for imparting their knowledge on machine learning, deep learning and various big
data computing tools.
References
Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classification based on
word2vec and SVMperf. Expert Systems with Applications, 42(4), 1857-1863.
Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram
machine learning approach. Expert Systems with Applications, 57, 117-126.
Hosseini, H., Kannan, S., Zhang, B., & Poovendran, R. (2017). Deceiving google's perspective api
built for detecting toxic comments. arXiv preprint arXiv:1702.08138
Srivastava, S., Khurana, P., & Tewari, V. (2018, August). Identifying aggression and toxicity in
comments using capsule network. In Proceedings of the First Workshop on Trolling, Aggression and
Cyberbullying (TRAC-2018) (pp. 98-105)..
Georgakopoulos, S. V., Tasoulis, S. K., Vrahatis, A. G., & Plagianakos, V. P. (2018, July).
Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic
Conference on Artificial Intelligence (p. 35). ACM.
Pavlopoulos, J., Malakasiotis, P., & Androutsopoulos, I. (2017). Deep learning for user comment
moderation. arXiv preprint arXiv:1705.09993.