Sie sind auf Seite 1von 3

DIAGNOSIS OF ANXIETY DEPRESSION FROM

REAL-TIME SOCIAL MEDIA DATA


ALAPAN KAR, APOORV SANJAY SRIVASTAVA
COMPUTER SCIENCE AND ENGINEERING,
SRM UNIVERSITY, KATTANKULATHUR, INDIA
1alapankar_partha@srmuniv.edu.in

3apoorvsanjay_sanjay@srmuniv.edu.in

Abstract— Purpose Social networks have been developed as a In almost all the countries the people who are suffering from
great point for its users to communicate with their interested depression are not diagnosed properly at an early stage.
friends and share their opinions, photos, and videos reflecting Studies have clearly associated anxious depression with
their moods, feelings and sentiments. This creates an
opportunity to analyze social network data for user’s feelings greater depressive severity, elevated risks of suicides and
and sentiments to investigate their moods and attitudes when higher risks of cardiovascular diseases (Sonawalla, & Fava,
they are communicating via these online tools. Methods 2001). These findings have encouraged us to throw light and
Although diagnosis of depression using social networks data has explore the expounding issues pertaining to the diagnosis of
picked an established position globally, there are several anxious depression from social media data.
dimensions that are yet to be detected. In this study, we aim to Social Media is widespread and has become a
perform depression analysis on Twitter data collected from an
online public source. To investigate the effect of depression platform for many to interact with others on the same
detection, we propose machine learning technique as an efficient platform. On this platform a person can outlet his feelings in
and scalable method. Results We report an implementation of an environment which can rarely pose any threat to him. It is
the proposed method. We have evaluated the efficiency of our generally hard to articulate feelings while engaging in the real
proposed method using a set of various psycholinguistic features. world, but online expression of feelings helps convey the
We show that our proposed method can significantly improve mental condition of a person into a physical form. Social
the accuracy and classification error rate. In addition, the result
shows that in different experiments (DT) gives the highest media thus can help largely in diagnosing anxious depression
accuracy than other ML approaches to find the depression. in active social media users who express their internal
Conclusions Machine learning techniques identify high quality disturbance in the platform. This motivated us into exploring
solutions of mental health problems among Twitter users. the various prediction models for detecting anxious
depressive disorder in 6GB data of samples and then come up
Index Terms—Anxiety, Depression, Social-Media, Deep with the most accurate prediction model.
learning.

II. MODULES
I. INTRODUCTION
A. DATA COLLECTION
A nxious depression is a term that characterizes the
mental state of an individual who is diagnosed with
consistent feeling of anxiousness which results in depression.
The dataset with past one month tweets of 100 sampled
users is scrapped using the google news api. The first 100
It is basically MDD co-morbid with anxious disorder. followers of Google News dataset are considered for this
Behavioral psychopathology relates the two terms- anxiety research. Each user's data consists of name, date of
and depression closely. According to WHO globally, an account creation, account verification status (verified or
estimated 300 million people are affected by depression. not), language, description and tweet count. for each user,
Further according to Canadian Mental Health Association tweets are fetched, with date and time of post, number of
(2016), 20% of Canadians belonging to different have re-tweets, hash tags, mentioned users.
experienced mental illnesses during their lifetime, and around B. PRE-PROCESSING
8% of adults have gone through a major depression. At its
Pre-processing is the process of cleaning and filtering the
worst depression can lead to suicides. According to a survey
data to make it suitable for the feature extraction. the process
conducted by WHO, around 800000 people die every year
includes: • removing numeric and empty texts, URLs,
due to suicide (which is being the second leading cause of mentions, hashtags, non-ascii characters, stop-worst and
death in 15-29 years old). One of the most crucial barriers to punctuations • tokenization of tweets is done using the
effective care is the inaccurate assessment of the syndrome. irgramlaxstbjw.a.im of python natural language toolkit (nltk)i
to filter the words, symbols and other elements called 3) Tweet Frequency: Though timing of tweets is
tokens(1,901,& bird, 2002) the tokens are converted to lower primarily associated with odd-hour postings, generic tweet
case. as internet is an informal way of communication, the frequency within 24 hours also demonstrates user's
use of slangs and emojis is a common practice. these may restlessness and urge to share. The feature is set to true: '1' if
help understand the context and also intensify the emotion the no. of tweets in any hour of the day is equal to or greater
associated. • stemming to reduce the words to their root than 3, else it is set to false: '0'.
words using porter's stemmer6_ stemming enhances the
likelihood of matching to the lexicon. tweets can at most D. Figures and Tables
contain 280 characters, so users tend to write in short forms. The accuracy of the proposed AD Prediction Model is
different users can use different terms for the same word, not 85.09% with an F score of 79.68%. The model is able to
every synonym word can be included to lexicon, it will achieve motivating results and predict users with anxiety
increase the processing time, stemming is crucial for the depression. Fig 1 shows it.
accuracy of the prediction model.

C. FEATURE ENGINEERING
The feature vector for building the learning model is
trained using a 5-tuple vector where, Presence or absence of
anxiety related word using the anxiety lexicon base. More
than 2 posts during odd hours of night, specifically between
12am to 6am. More than 3 posts in an hour during anytime of
the day. More than 25% average posts in 30 days with
negative polarity. Presence of more than 25% polarity
contrast in posts within the past 24 hours.

TABLE I
CLASSIFICATION OF ACCURACY OF CLASSIFIERS

Fig. 1 Accuracy of classifiers

1) Anxiety Lexicon Base: Depressive Rumination is the


compulsive focus of attention on thoughts that cause feelings
of sadness, anxiety and distress. A person with anxious
depression disorder is very likely to verbalize thoughts using
specific anxiety related words. Therefore, an anxiety lexicon
base with a seed list of60 words is built with keywords that
represent anxious depression in textual content. The seed list
is eventually grown using Word Net. The processed tokens
from the tweets are matched with the lexicon and the feature
value is set to true (' I') if the word is present in the lexicon
base else it is set to '0'. Table 1 presents the lexicon base of
initial 60 words.
2) Tweet Timing: Chronic insomnia, which is
sleeplessness, is one of the most common symptoms of Fig. 2 shows performance of proposed AD Model
anxious depression and stress. Users who are active through
the midnight hours, i.e. from 12 am to 6 am, evidently show
Fig. 2 performance of proposed AD Model
psychological disturbance in sleep pattern with increased
restlessness and over-thinking. Therefore, tweet timing is an
important feature and the value of feature is set to I if two or
more than 2 tweets are posted after mid-night during odd III. CONCLUSION
hours of 12am to 6am, else it is set to 0. Social media has revolutionized the way we interact with
the world, allowing us all to stay connected and self-express.
Mixed anxiety depression and social media seem to exist in a We thank Mr. AMJ Muthu Kumaran for for assistance with
vicious cycle with one problem often stimulates the other. A the methodology], and Mr. S. Saminathan for comments that
supervised learning-based prediction model is proposed in greatly improved the manuscript and we thank 3
this research, where tweets of first 100 followers of MS India “anonymous” reviewers for their so-called insights. We are
student forum are analysed using various linguistic, semantic also immensely grateful to (List names and positions) for
and activity features to detect anxious depression disorder. their comments on an earlier version of the manuscript,
The presence of anxiety related words Ivas considered as should not tarnish the reputations of these esteemed persons.
linguistic markers whereas count of negative tweets and
polarity contrast of tweets &elated to semantic marker. Users'
REFERENCES
post liming and frequency were also considered to build a
model to efficiently predict anxious depression in users. 1. Scott J. Social network analysis. Thousand Oaks: Sage; 2018.
2. Serrat O. Social network analysis. In: Knowledge solutions. Singapore:
Nearly 85% predictions are found to be accurate in the Springer; 2018. p. 39–43.
preliminary analysis. As a possible future work, fine grain 3. Mikal J, Hurst S, Conway M. Investigating patient attitudes towards the
emotion analysis can be done to detect anxiety indicators use of social media data to augment depression diagnosis and treat-
instead of using SentiWordNet which categorizes the words ment: a qualitative study. In: Proceedings of the fourth workshop on
computational linguistics and clinical psychology—from linguistic signal
into three polarities. Further, the model can be tested on to clinical reality. 2017.
different user base: geographic, age, profession etc. Neuro- 4. Conway M, O’Connor D. Social media, big data, and mental health: cur-
fuzzy and deep learning models can be explored for rent advances and ethical implications. Curr Opin Psychol. 2016;9:77–82.
superlative prediction performance. 5. Ofek N, et al. Sentiment analysis in transcribed utterances. In: Pacific-
Asia conference on knowledge discovery and data mining. 2015. Cham:
Springer.

Acknowledgment