BHS FinalArticleForConference

Determining the Level of Depression using BDI-II through Voice Recognition
Justin Brian Balano Vanessa Ley Huerto Sigfried Sanchez

School of Information Technology School of Information Technology School of Information Technology
Mapua University
Mapua University Mapua University
Makati City, Philippines
jbdbalano@mymail.mapua.edu.ph
Makati City, Philippines Makati City, Philippines
vlclhuerto@mymail.mapua.edu.ph smsanchez@mymail.mapua.edu.ph
Aresh Saharkhiz*
School of Information Technology Joel De Goma*
Mapua University School of Information Technology
Makati City, Philippines Mapua University
asaharkhiz@mapua.edu.ph Makati City, Philippines
jcdegoma@mapua.edu.ph
Abstract – Depression is a mental health disorder that To raise self-awareness about depression, the
is becoming a great threat to the mental well-being of every researchers conducted this study with the consultation of a
person experiencing it. To help those people in need by raising professional expert that validated the process of conducting
their awareness, the researchers developed a model of a system the interview. A screening tool called BDI-II was used in this
based on the questionnaire, Beck Depressive Inventory or BDI-
study. It is a widely used screening tool utilized by
II, used by psychiatrists for assessing depression. The system is
embedded on a small robot known by the name of Cozmo. The professional expert to assess patient during physical
model of the system has the capability to determine the level of examination to determine their level of depression [11].
depression a person is experiencing through their voice features With all the information given above, a model of
by utilizing two algorithms – Support Vector Machine (SVM) the system is specifically designed to be used in assessing
and Decision Tree (DT). Participant’s speech was converted to
one’s level of depression through their voice data in data
text to obtain the level of depression by computing the score of
their answers, while 34 voice features were extracted through
analysis by utilizing the psychiatric questionnaire called
every voice recording of each participant. A total of 84 BDI-II – a screening tool for ascertaining the severity of
participants, 42 college and 42 senior high school students depression through a scoring system which corresponds to 6
participated in this study. The model achieved an accuracy rate different levels of depression. To add with the effectiveness
of 40.5% for the SVM algorithm and 28.57% for the Decision of the model, this study incorporated an active interaction
Tree algorithm with both algorithms showing that the with the subject by means of a robot named Cozmo which
additional voice features, that were added by the researchers was programmed and tasked to ask the questions from BDI-
with the initial voice features, gained a higher accuracy. II during the interview. The researchers made used of the
Keywords–components; Level of Depression; Beck latest BDI-II that was revised in 1996 to correspond the
Depressive Inventory II (BDI-II); Support Vector Machine DSM-IV criteria for depression [10]. Diagnostic and
(SVM); Decision Tree (DT) Statistical Manual (DSM) was described by the professional
expert as the “bible” of psychiatrists to diagnose a patient
with depression. The voice analysis had a total of 34 voice
I. INTRODUCTION
features (VF), which was extracted by the subject’s voice
recording from answering the questions from BDI-II. The
One of the important aspects of every well-being’s extracted features were the parameters experimented by the
life is their mental health. It is essential for one’s self to pay 2 models with different algorithms, SVM and Decision Tree,
attention of his or her mental health because if not, it may to show the significance of the voice features in determining
lead to a serious number of health problems that can the level of depression.
endanger one’s ability to think properly in carrying out
major life decisions [1]. One of the mental health disorders This research aims to know the outcome of the 34
that has been a great problem with a lot of the countries’ VF weight on the accuracy rate of the models’ results in
young and adult population is depression [2, 3]. assessing the severity of subject’s depression. Initially only
20 VF would be tested as the parameters for the voice
One of the factors being observed and said to be analysis. These 20 VF were the ones that were already
connected with depression is how a person talks on his or her established from different studies, [6][7][8], to have the most
day to day basis. It is said on a study made by the researchers relevance with depression detection. The researchers,
from University of Maryland that a person’s vocal features however, added 14 additional VF to determine if the new VF
become gravelly, hoarse and less fluent when experiencing attain higher results compared to the initial 20 VF having in
depression [5]. total of 34 VF. 84 participants (42 college students and 42
________________________________________________ senior high school students) were recruited to this study with
*Note: Names of Adviser and Supervisor of this research paper their given consent to participate. The participants’ age
1
ranges from 15-29 years old, which was said to be the critical feature extraction took place for the data set collected. These
age range of teenagers and young adults that attempt to put features include: Zero Crossing Rate, Energy, Entropy of
their life in danger in effect of their mental health disorders Energy, Spectral Centroid, Spectral Entropy, Spectral Flux,
such as depression [6]. Spectral Rolloff, and MFCCs. These 8 features (which
represent the 20 VF), were the voice features that would be
the initial baseline when comparing with the addition of the
II. MODEL DESIGN AND EXPERIMENT new features. The voice features that were added by the
researchers are the following: Spectral Spread, Chroma
Different researchers developed different models in Vector, and Chroma Deviation in total 3 voice features
regard to depression detection through the utilization of (which represent the 14 VF), were added to determine if they
voice recognition. Jan, Meng, Gaus and Zang developed an produce higher results combined with the initial features.
artificial intelligent system that uses convolutional neural When all the processing had finished the training, the testing
network (CNN) to recognize the level of depression through of the data would commence. This study used a software tool
examining the audio and visual cues by using the BDI-II called Matlab to conduct the testing and training of data
scale of depression. Spectral low-level descriptors (LLD) using two algorithms SVM and Decision Tree.
and mel-frequency cepstral coefficients (MFCC) were
extracted through the AVEC2014 dataset [7]. MFCC is the
B. Experiment
best voice feature that had shown a huge impact with the To conduct the experiment the researchers had set
results in determining the level of depression of the out to recruit participants to partake in this study. There was
participant. In the study [8] conducted by Low, Maddage, a total of 84 willing participants, 42 college and 42 senior
Lech, Sheeber and Allen, 139 subjects of mother and child high school students, who took the interview. Before starting
had undergone problem-solving interaction (PSI). The the interview, the participants were given a consent form
participant’s voice in the interview were extracted to voice assuring the confidentiality of their answers and results.
features which was trained to a classification model using
gaussian mixture models (GMM) algorithm. It was shown
that the spectral features such as spectral roll-off, spectral
flux and spectral entropy shown a huge significance in
contributing with the result of the model. Long, Guo, Wu,
Hu, Liu and Cai developed a model that can determine
depression by extracting voice features from the audio of the
oral interviews conducted [9] by using SVM on a 74 data set.
Energy, zero-crossing rate (ZCR) and entropy of energy
(EOE) were the 3 voice features that greatly affected their
study.
A. Model Design
Figure 2. Sample Data Gathering Setup

As seen on the figure above, in front of the participant is
Cozmo, the small robot, which vocalizes the questions from
BDI-II. Figure 2 shows that the participant was holding a
paper, a hard copy of BDI-II, to follow on the flow of
questions that Cozmo would be asking as well as to choose
from the set of answers provided. A laptop, to run the model
of the system with the program using Python with the IDE
called PyCharm. A pair of headphones was used as a
microphone to record and save the answers chosen by the
participants. Lastly a cellphone, to connect the model of the
system to Cozmo’s programming in order to run the model.
Figure 1. Model Design III. METHODS

Figure 1 shows the flow and design this study used
for the methodologies. The model starts with the data The methods of this study were gathered from
gathering in which the interview will be done by Cozmo, the different related literatures combined that fit the study. In
small robot which acts as the interviewer. Once all the set this chapter, the detailed implemented methodology will be
questions are finished the data underwent data pre- discussed thoroughly in a systematic way. Starting from data
preprocessing for cleanup and fixing. After which the voice gathering to data pre-processing and lastly, data processing
2
with two models that used different algorithms – SVM and
DT.
A. Data Gathering
The process of the interview was validated by a
psychiatrist. There are 42 participants from college and
another 42 participants from senior high school within the
age range of 15-29 years old that partake in this research.
The interview was done on a quiet small room that
has the least noise to record the participant’s voice clearly
and to avoid unnecessary noises to occur in the recordings. Table 1. Rubrics of BDI-II
A small robot named Cozmo was programmed to The accumulated score with all of the 21 questions
ask the questions of the psychiatric questionnaire, BDI-II, to determines the level of depression of the subject. This
each interviewee. There are 21 questions in the questionnaire process, according to the psychiatrist who assisted this study,
that was answered by the participants through the choices is one of the physical examinations being done in their
provided in the questionnaire. The interviewee holds a copy medical field to know the level of depression of a person.
of the BDI-II questionnaire printed on a paper as a reference C. Feature Extraction
for answering the questions. Each interview was supervised The voice of each participant was subjected to
by the researchers to ensure that the interviewee finished all
voice analysis. The different types of VF are based on related
the questions.
literatures (RRL) [6][7][8] are Zero Crossing Rate, Energy,
The answers of the participants were recorded
Entropy of Energy, Spectral Centroid, Spectral Entropy,
through a headset provided for a clearer sound recording
Spectral Flux, Spectral Roll-off and MFCCs. Since the
compared to the built-in microphone of the laptop. The mentioned VF are proved to give good results on depression
headset recorder was integrated with an application program
detection, the researchers added these 3 VF to test its
that did two particular things:
effectivity on determining the level of depression a person is
1.) Saves every answer in a corresponding wav files and
experiencing: Spectral Spread, Chroma Vector, a 12-element
combine it all together in a single wav file once all the
vector, and Chroma Deviation.
21 questions were answered completely.
These VF were extracted through a program that
2.) Converts the voice recording of each answer from uses pyAudioAnalysis. After extracting all the VF, another
speech to text and appends it on a single text file.
program did the computation of mean, median, variance,
B. Data Pre-processing average, standard deviation, minimum, maximum and range
Data pre-processing involves cleaning up the of 34 VF. The system compiled the computed 34 VF of the
recordings by removing dead air and lessening background 84 participants in a CSV file. The 8 different combination of
noises; extraction of voice features; and scoring of the the 14 new VF with the 20 RRL VF are shown with the table
interview based on the rubrics of the psychiatric below.
questionnaire.
1) Noise Reduction
The wav files of every participant’s interview have
been subjected to noise reduction by the utilization of
Movavi Editor – a mobile application for editing audio clips.
This was done to reduce unnecessary noises and the
abstraction of dead air. This step was done to enhance every
audio clip for feature extraction.
2) BDI-II Scoring Table 2. Combination of 14 new VF together with 20 RRL
Text document containing each of the participant’s VF
complete answers of the whole 21-question interview was
inputted on a system that scores each interviewee based on These different combinations were inputted into
the rubrics of BDI-II to know their level of depression. two models to determine the result of every combination
The system has a database of answers of every with its accuracy in determining the level of depression.
question by each level. The answer of the participant had D. Data Processing
been scanned through the database of answers associated Two models with different algorithms were utilized in
with the equivalent score which is shown in Table 1. If the this study to see the effectivity of different VF combinations
answer of the participant is not in the database, the in determining the level of depression. A third-party
researchers would manually check the participant’s answer software called Matlab was used to create and calculate the
and add it to the database. After adding the new data, the text results of both models.
file will be run through the scoring system again to score the
interview.
3
With the combined percentage of level 1’s 15% and level 2’s
29% to a total of 44% this indicates that more than a quarter
of the people who participated in the study show a normal or
mild level of depression. However, the results combined
with the Moderate and Significant Classification of
Depression shows 56%. This means that 47 out of 84
participants are showing signs of depression.
After processing all the 8 data sets with the two
systems, the results shown below has proved that SVM,
Figure 4. Model 1 and 2 Diagram using Decision Tree and algorithm of Model 2, is the best algorithm to be used with
Support Vector Machine Algorithms voice analysis. It yielded with the best results in regard to
accuracy rate in determining the level of depression with all
Figure 4 shows the flow of how the data set was trained
the data set.
and tested. Decision trees typically can be made into two
categories, classification or regression models, as in a form
of tree structure. It separates a data set into smaller and
smaller subsets while in the meantime a related or associated
decision tree is gradually created. The Decision Tree
algorithm works by letting the algorithm determine a pattern
to identify VF to a level of depression. Once the algorithm
learns the specific VF for the level it would then proceed to
identifying the next VF for the other levels. It would
continue to do so until all the levels have been found
corresponding VF. While Model 2 used the SVM algorithm
which was said on a previous study to be the best algorithm Table 4. Statistical Table of Model 1 Decision Tree
to use with analyzation of voice parameters [7][8]. SVM Algorithm
algorithm is a supervised machine learning algorithm which
was utilized for classification problem. It utilized a method
called the kernel trick to change information and after which
depending on these changes finds an ideal limit between
probable results. At that point the algorithm makes sense of
how to separate the information based on the labels, results
are characterized. Both models used 8 different
combinations of data set as the training and testing set. The
level of depression being the dependent variable and VF as
the independent variables were set to every data set inputted
into the two models. To avoid bias with the divisions of data,
Table 5. Statistical Table of Model 2 Support Vector
the model used cross validation with 10 folds with shuffling
Machine Algorithm
sampling technique. The results were measured by accuracy,
precision, recall and classification error. The output from this study indicates that the VF that
IV. RESULTS were added by the researchers namely Spectral Spread (SS),
This study was conducted to see the effect of Chroma Vector (CV), and Chroma Deviation (CD) resulted
different voice parameters in determining the level of in a lower accuracy rate. Although combination 1, 2, 4, 5 and
depression of a person suffering from a mental disorder 7 shown higher accuracy compared to combination 8 which
called depression. has all the new VF and the VF from previous related
literatures; But only combination 5 shown higher accuracy
rate in model 1. This means that only combination 5 has the
effect in determining the level of depression.
Table I. Statistical Table for SVM

Figure 6. Pie Chart of Number of Participants per Level of
Depression
4
The table above shows the statistical measurements SVM algorithm was 40.5% and 28.57% for the DT
of the highest accuracy of the model’s voice feature algorithm. In conclusion the main voice features the model
detection based from table 5. Combinations 2, 3, 5, 6 and 7 is able to detect are the additional voice features that the
which used SVM as the algorithm and cross-validation of 10 researchers had added with the combination of previous
folds show that the accuracy rate results to a value of works, voice features. The additional features have varying
40.50%. results based on the combination and the algorithm being
used.
Another improvement may be done with replacing
the robot Cozmo with the latest model from Anki, Vector.
Vector is equipped with a built-in micro-phone which
Cozmo does not. In the beginning of the data gathering,
future works may try to conduct an interview with the robot
Table II. Statistical Table for Decision Tree and one interview without the robot to determine if the
participant is in anyway being affected in his or her state
Table II represents the highest accuracy from the upon having contact with the robot. Additional features
model 1 using Decision Tree algorithm also done with a might also indicate in giving higher results. This study shows
shuffling sampling cross-validation with 10 folds. The that with the addition of Spectral Spread feature in return
accuracy rate valued at 28.57%. This accuracy results can be gives higher results.
seen with combination 5 of VF as presented in table 4.
Future work may be done to improve the model.
This could be done by having sufficient or dedicated
V. CONCLUSION hardware for the processing of the program. The current
equipment the study used was not optimized for this
particular study, the researchers used a personal computer
This study presents that it was able to detect the
and not a dedicated research computer. This resulted in the
level of depression a person was experiencing through BDI-
model to take too long to respond or to completely shut down
II and voice recognition. The researchers were able to
and not respond.
develop a model of a system that would apply the use of
BDI-II and voice recognition to accurately determine the
level of depression of a participant the highest result for the
[7] A.Jan, H. Meng, Y. Gaus, F. Zhang, “Artificial Intelligent
System for Automatic Depression Level Through Visual and
[1] Mentalhealth.gov (2017, August 29). What Is Mental Vocal Expression,” IEEE Transactions on Cognitive and
Health?. Retrieved November 21, 2018, from Development System, 2017.
https://www.mentalhealth.gov/basics/what-is-mental-health
[8] H. Long, Z. Guo, X. Wu, B. Hu, Z. Liu, H. Cai, " Detecting
[2] World Health Organization (2018, March 22). Retrieved Depression in Speech: Comparison and Combination
November 21, 2018, from http://www.who.int/news- between Different Speech Types," IEEE International
room/fact-sheets/detail/depression Conference on Bioinformatics and Biomedicine (BIBM),
[3] American Psychiatric Association (2017 January). What is 2017.
Depression? Retrieved February 12, 2018, from [9] M. Kawado, S. Hinotsu, Y. Matsuyama T. Yamaguchi, S.
https://www.psychiatry.org/patients- Hashimoto, Y. Ohashi, “Single data extraction generated
families/depression/what-is-depression more errors than double data extraction in systematic
[4] Gilbert, C. (2009, August 10). Recognizing the Subtle Signs reviews,” Controlled Clinical Trials 24, 2003.
of Depression. Retrieved December 16, 2018, from [10] Canadian Partnership for Stroke Recovery. BECK
https://www.elementsbehavioralhealth.com/mood- DEPRESSION INVENTORY (BDI, BDI-II)). Retrieved
disorders/recognizing-the-subtle-signs-of-depression/ February 1, 2019, from
[5] Knapton, S. (2014, November 3). Depression changes how https://www.strokengine.ca/en/indepth/bdi_indepth/
people speak, research finds. Retrieved December 16, 2018, [11] Psych Congress Network. Beck Depression Inventory-II
from (BDI-II). Retrieved February 1, 2019, from
https://www.telegraph.co.uk/news/health/news/11205892/De https://www.psychcongress.com/saundras-corner/scales-
pression-changes-how-people-speak-research-finds.html screenersdepression/beck-depression-inventory-ii-bdi-ii
[6] World Health Organization (WHO). Suicide. Retrieved
January 14, 2019 from https://www.who.int/news-room/fact-
sheets/detail/suicide.

BHS FinalArticleForConference

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

BHS FinalArticleForConference

Hochgeladen von

Copyright:

Verfügbare Formate

Determining the Level of Depression using BDI-II through Voice Recognition

Justin Brian Balano Vanessa Ley Huerto Sigfried Sanchez

Figure 2. Sample Data Gathering Setup

Figure 1. Model Design III. METHODS

Table I. Statistical Table for SVM

Das könnte Ihnen auch gefallen