Sie sind auf Seite 1von 6

Bully Me Not: An AI Approach to Stopping Cyberbullying

Daryll James Carrasco,1 Melvin D. Macapinlac,2 Nigel A. Silva3, and Paul Michael B. Uy4

Master of Science in Data Science,


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE),
Asian Institute of Management, Makati City, Philippines
1
dcarrasco@aim.edu
2
mmacapinlac@aim.edu
3
nsilva@aim.edu
4
puy@aim.edu

Abstract
Social media sites have become an avenue for some users to spread hate speech and toxic comments. Others are
harassed and become victims of cyberbullying. The goal of this project is to use deep learning methods to identify
toxicity in online comments, which can be used to filter potentially harmful messages, promote a healthier and
safer social media space for everyone especially for the kids, and to help users craft more civil arguments when
engaging in an online discussion. This project will be implementing various deep learning models and evaluate its
performance on a binary classification task – whether a given text is toxic or not toxic. Using a 4-layer neural
network model, the team was able to develop an architecture than can accurately predict up to 91.01% accuracy
whether a given text is toxic or not.

Keywords: cyberbullying, toxic comments, neural network, machine learning, deep learning, classification

Introduction
Discussion in various social media platforms often lead to harassment and abuse due to anonymity
of users. Users have found it now much easier to spread hate, harmful and toxic comments in the
comments section of different social media sites. These sites unfortunately have become a medium for
bullies, and these can be very harmful especially for users who can become victims of online harassment
and cyberbullying. With the help of modern computing tools, language patterns used by bullies can be
identified, and rules can be developed to automatically identify cyberbullying contents.

The goal of this project is to use deep learning methods to identify toxicity in online comments,
which can be used to filter potentially harmful messages, promote a healthier and safer social media space
for everyone especially for the kids, and to help users craft more civil arguments when engaging in an
online discussion. This project will be implementing various deep learning models and evaluate its
performance on a binary classification task – whether a given text is toxic or not toxic.

Review of Related Literature


In 2015, Zhang et.al proposed a method for sentiment classification using word2vec and SVMperf
(Zhang et al, 2015). They used word2vec to cluster similar features which demonstrated that word2vec can
capture semantic features in languages. Moreover, they used SVMperf for the classification process. Their
study shows how superior their methodology is for this kind of text analysis. The group of Tripathy tried
four different machine learning algorithms such as Naive Bayes (NB), Maximum Entropy (ME),
Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM) to classify human sentiments.
Their approach is to use unigram, bigram, trigram and the combination to be able to create the model
(Tripathy, et al. 2016)

Hosseini et. al, proposed an attack on Google’s Perspective API that detects toxic comments.
Their study shows that toxicity scores can be reduced to the level of non-toxic comments which questions
the usability of the said API (Hosseini et al., 2017).

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

1
A different approach was done by the team of Srivastava. They used capsule network to identify
aggression and toxicity in the comments. In their work, they demonstrated that the problem of the balance
of the data can also be addressed using their methodology (Srivastava et al, 2018). Georgakopoulos and his
team tried a similar approach but with the use of Convolutional Neural Networks (CNN) (Georgakopoulos
et al, 2018). They compared CNN to the traditional bag-of-words approach. Their results show that CNN
can enhance toxic comment classification. On the other hand, Pavlopoulos et. al used RNN for user
comment moderation and concluded that RNN outperforms the state-of-the-art methods (Pavlopoulos et.al,
2017).

Data Collection and Description


The data that will be used in this project is the Jigsaw Toxic Comments Classification Dataset.
The dataset contains 159,571 entries of Wikipedia comments which have been labelled by human raters for
the comment’s toxicity. The dataset contains the following fields:
 id – comment id
 comment_text – the comment posted
 toxic – a boolean identifier
 severe_toxic – a boolean identifier
 obscene – a boolean identifier
 threat – a boolean identifier
 insult – a boolean identifier
 identity_hate – a boolean identifier

For the purpose of this project, we will be using the toxic field as our indicator to identify whether a
given comment is toxic or not toxic.

Methodology
This section will discuss the steps done in this project: (1) data preprocessing, (2) modelling
and (3) model evaluation.
A. Data Preprocessing
Balancing of Data
The original dataset contains 144,277 entries of non-toxic comments and 15,294 entries of
toxic comments. For the toxic and non-toxic comments to have an equal representation in our model,
the dataset was first balanced. Balancing the dataset to a 50-50 toxic/non-toxic comments split by
random under-sampling resulted to a total of 30,588 entries of comments that will be used for training
and testing. We will be using a 90-10 split for our training and testing data.
B. Modelling
We used Keras to implement our neural network. The architecture is summarized below:

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

2
Figure 1. Architecture of our Neural Network Model

First, the comments were fed into the pre-trained word embeddings GloVe-50D. This layer
will give the vocabulary to our model by giving context to words with similar meanings. GloVe
means Global Vectors for Word Representations which is an unsupervised learning algorithm for
obtaining vector representations for words. Training is performed on aggregated global word-word
co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear
substructures of the word vector space.
The second layer is the bidirectional long short-term memory (LSTM) which gives the
semantics to our model by helping it understand the sequence of words before and after the token.
Bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input
sequence as-is and the second on a reversed copy of the input sequence. This can provide additional
context to the network and result in faster and even fuller learning on the problem. There are 50 nodes
used in this layer with return_sequences parameter set to True, dropout and recurrent_dropout
parameters set to 0.1. A MaxPool layer was also added which will reduce dimension, speed up run
time, and mitigate overfitting.
The third layer is a dense layer which has 50 nodes and uses a relu activation function. A
dense layer is a fully connected node which does a linear operation on the layer’s input vector. This
employs an additional dropout layer to regularize and further avoid overfitting.
Lastly, the final output layer uses a sigmoid activation function, which will classify whether a
given comment is toxic or not. Overall, the model yields 1,045,501 trainable parameters.
The table below summarizes the neural network model with the parameters used in this
project:

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

3
Table 1. Model Summary
C. Model Evaluation
For this classification task, the performance of the model will be tested according to test
accuracy, and precision and recall scores. These metrics will be obtained from the confusion matrix
obtained after prediction.
The main metric to be used is the accuracy, which is the ratio of the correctly predicted
observations to the total observations. Further, the precision and recall scores are an important
measure to further validate the model’s performance, particularly for imbalanced datasets. The
precision is the ratio of correctly predicted positive observations to the total predicted positive
observations. Recall is the ratio of correctly predicted positive observations to all the observations in
the actual positive class.
Results and Discussion
We planned to run the model through several epochs, however after the second run, the
algorithm determined that there were no improvements to the accuracy. As such the team decided that
the best weight generated from the first two epochs are enough to determine that the model is
materially trained. From the training we obtained an accuracy of 91.01%.
With this we can conclusively say that using the model created from the analysis 91.01% of the time
we can determine if the textual input is toxic in nature.
Given this the team decided to check the confusion matrix generated by the model from the use of the
dataset.
We obtained the following
Actual Yes Actual No
Predicted Yes Toxic Comments Labeled as Non-Comments Labeled as Toxic
Toxic False Positive
True Positive 734
14,560
Predicted No Toxic Comments Labeled as Non-Toxic Comments Labeled as
Non-Toxic Non-Toxic
False Negative True Negatives
867 14,427

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

4
We further explored our results and determined that we obtained a precision of 95.2% which
translates to the model being able to correctly identify that the post or comment is toxic from the
entire dataset.

Precision = Toxic Comments Labeled as Toxic/ Toxic Comments Labeled as Toxic + Non-Toxic
Comments Labeled as Toxic

Following this line of analysis, the model achieved a recall of 94.33% which translates to the model
being able to correctly identify that the post or comment is toxic from the pool of comments from the
dataset that is actually toxic in nature.

Recall = Toxic Comments Labeled as Toxic/ Toxic Comments Labeled as Toxic + Toxic Comments
Labeled as Non-Toxic

We know that if a dataset is balanced accuracy is good metric for the model validity; however, for
good measure the team decided to check the F1-score, which considers both the precision and recall
and attempts to get their harmonic mean. From this the team obtained a 94.76%

F1-Score = 2 * (Recall * Precision) / (Recall + Precision)

Conclusion and Recommendations


The aim of this analysis was to classify if a post or comment is toxic in nature. An artificial
intelligence approach powered by machine and deep learning were used in order to achieve this goal.
Based on the results obtained from the analysis we can conclude that artificial intelligence is already
capable enough to identify if a textual input is toxic in nature or note. From this study we can see that
the model was able to identify if a post or comment is toxic in nature with a 91.01% confidence.
However, the model is just as good as the training data. In order to have a better accuracy, a more
extensive dataset is recommended. It might be better to train different models for different context
with use of different datasets.
There are enormous amount of literature and attempt at this project. However, none have reached the
implementation and deployment phase.
For the team’s recommendation, the study can be implemented in two ways. First is through
integration with social media platforms. If social media platforms integrate this functionality, they can
automatically moderate individual posts or comments without invading a person’s privacy. For
minors on social media, this should be a default functionality. However, for those of legal age, the
functionality should have the user’s consent, as such they should be freely allowed to toggle it on or
off.
The other implementation that the team suggests is to make a browser plugin that does the same.
Parents or legal guardians should have the power to turn on/off the functionality for each user of the
computer.
As a final recommendation, the team propose that the study be implemented in different language as
to allow for more people to reap the benefits of the analysis and the model created.

Acknowledgements

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

5
We would like to thank Prof. Christopher Monterola, Prof. Erika Legara, Prof. Christian Alis
and the rest of Asian Institute of Management’s Master of Science in Data Science faculty, data
scientists and staff for imparting their knowledge on machine learning, deep learning and various big
data computing tools.

References

Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classification based on
word2vec and SVMperf. Expert Systems with Applications, 42(4), 1857-1863.
Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram
machine learning approach. Expert Systems with Applications, 57, 117-126.

Hosseini, H., Kannan, S., Zhang, B., & Poovendran, R. (2017). Deceiving google's perspective api
built for detecting toxic comments. arXiv preprint arXiv:1702.08138

Srivastava, S., Khurana, P., & Tewari, V. (2018, August). Identifying aggression and toxicity in
comments using capsule network. In Proceedings of the First Workshop on Trolling, Aggression and
Cyberbullying (TRAC-2018) (pp. 98-105)..

Georgakopoulos, S. V., Tasoulis, S. K., Vrahatis, A. G., & Plagianakos, V. P. (2018, July).
Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic
Conference on Artificial Intelligence (p. 35). ACM.

Pavlopoulos, J., Malakasiotis, P., & Androutsopoulos, I. (2017). Deep learning for user comment
moderation. arXiv preprint arXiv:1705.09993.

Master of Science in Data Science


Aboitiz School of Innovation, Technology, and Entrepreneurship (ASITE)
Asian Institute of Management

Das könnte Ihnen auch gefallen