Beruflich Dokumente
Kultur Dokumente
Summary: The human voice production system is an intricate biological device capable of modulating pitch
and loudness. Inherent internal and/or external factors often damage the vocal folds and result in some change of
voice. The consequences are reflected in body functioning and emotional standing. Hence, it is paramount to
identify voice changes at an early stage and provide the patient with an opportunity to overcome any ramification
and enhance their quality of life. In this line of work, automatic detection of voice disorders using machine learn-
ing techniques plays a key role, as it is proven to help ease the process of understanding the voice disorder. In
recent years, many researchers have investigated techniques for an automated system that helps clinicians with
early diagnosis of voice disorders. In this paper, we present a survey of research work conducted on automatic
detection of voice disorders and explore how it is able to identify the different types of voice disorders. We also
analyze different databases, feature extraction techniques, and machine learning approaches used in these
research works.
Key Words: Voice disorder−Pathological voice−Machine learning−Automatic detection−Literature survey
−Review.
(vi) psychiatric and psychological disorders, which are func- accuracy is computed. This classification accuracy is taken
tional voice disorder and/or mutism; (vii) neurological con- as a metric for assessing the performance of various AVDD
ditions, for example, adductor palsy, abductor palsy, and systems.
spasmodic dysphonia; (viii) other disorders like muscle ten-
sion dysphonia; and (ix) undiagnosed and otherwise not
VOICE DISORDER DATABASE
specified.5
Voice disorder database is an essential element in AVDD
Among different vocal fold lesions, mass-based patholo-
system. The dataset consists of voice recordings of both nor-
gies are highly prevalent due to the phonotraumatic impact
mal and pathological voices. The recordings can contain
it delivers on the susceptible multilayered vocal folds. Per-
either sustained vowel phonation or continuous speech. In
sistent tissue inflammation and external influences often
our survey, we have observed that most of the researchers
results in vocal nodule and vocal polyp.6 Under these cir-
have used standard databases, such as Massachusetts Eye
cumstances the vocal fold closure is incomplete, the voice
and Ear Infirmary (MEEI), Saarbruecken Voice Database
production is not economical and perceptually hoarse. On
(SVD), and Arabic Voice Pathology Database (AVPD).
the contrary, in nonphonotraumatic voice disorders, like
The more details about these three databases can be found
muscle tension dysphonia and functional voice disorder,
in Al-nasheri et al.8 Some of the researchers have used pri-
there is no vocal fold lesions, but vocal fatigue, degraded
vate dataset developed in collaboration with local hospitals.
voice quality, and increased laryngeal tension are observ-
able.
Multiparametric assessment protocol is considered ideal FEATURE EXTRACTION TECHNIQUES
for voice evaluation.2,7 Traditionally, a structured approach Feature extraction is the first step in any voice disorder
is essential and it encompasses the following: patient inter- detection system. In this step, the given voice signals are
view, laryngeal imaging by stroboscopy and/or laryngos- converted into representative acoustic features using various
copy, basic aerodynamic assessment, perceptual analysis by digital signal processing techniques. We will now discuss the
standardized psychoacoustic methods, acoustical analysis, most popularly used techniques for acoustic analysis and
and subjective voice evaluation. Considering the recent feature extraction in the related area.
technological advancement, voice scientists have been fore-
front in developing tools for acoustical analysis to differen-
Acoustic analysis
tiate normal voice from those with aphonia and/or
Acoustic analysis is the measurement of sound information
dysphonia.
in voice. The results of acoustic analysis could be used for
The aim of such innovations must be to offer economical
measuring the severity of a voice disorder. Some of the
solutions that is sensitive to identify voice changes and
measures associated with acoustic analysis of voice signals
remains reliable across dynamic clinical setups. With due
are as follows:
research in acoustical recording and analysis, the outcome
is automatic detection.
(i) Perturbations in the fundamental pitch period and
peak amplitude.
(ii) Vocal noise included in the signal.
AVDD SYSTEM
(iii) Cycle-to-cycle wave form variations.
Systems for the automatic detection of voice disorders can
(iv) Average frequency characteristics.
be designed and developed using ML algorithms. Here, the
(v) Transition characteristics of the signal.
voice data need to be preprocessed and converted into a set
of features before applying an ML algorithm. The impor-
The application of acoustic analysis in voice disorder
tant steps to be followed are shown in Figure 1:
detection can be found in Kasuya et al9 and Sonu and
Sharma.10 Multidimensional Voice Program (MDVP) is
1 A given set of voice data, in audio files, need to be
standard software for acoustic analysis and it is most popu-
labeled manually by experts as representing healthy or
larly used in AVDD systems. It is reliable and very compre-
a pathological voice.
hensive. Using MDVP, one can estimate 33 voice
2 Then, the raw audio data in each file are divided into
parameters, which measure frequency related, intensity
short frames and each frame is processed to extract fea-
related, noise related, tremor related, and its perturbations
tures from it.
measures among others.11
3 The set of features extracted from all the frames are
considered as input for the ML algorithms.
Mel-frequency cepstral coefficient
The dataset is divided into training and testing sets by Mel-frequency cepstral coefficient (MFCC) is a standard
randomly selecting the observations belonging to both nor- method for feature extraction that makes use of the knowl-
mal and pathological voices. The training set is used to edge of human auditory system. The general steps for
develop the ML model and the testing set is used to evaluate extracting the MFCC features for a single frame12,13 are as
the model. During the evaluation process, the classification follows:
ARTICLE IN PRESS
Sarika Hegde, et al A Survey on Machine Learning Approaches for Automatic Detection 3
In the next section, we discuss the research works carried They illustrated that the important features like voice onset
based on the specific ML algorithms. All the papers are and voice offset attributes, which contributes significantly
summarized in Table 1 with the details as author names, in detection of voice disorders, are ignored in case of sus-
database used, list of features extracted, type of voice disor- tained vowels input. Hence, they worked on continuous
der detected, ML algorithm used, and the accuracy. We dis- speech domain rather than sustained vowels and have pro-
cuss only the significant research works based on accuracy posed a novel feature extraction method, multidirectional
(above 90%) and highlight the same in the table. regression, which takes into account consonant and vowel
locations, formant transitions, and voice onset and offset
distributions for continuous speech. The features are classi-
HMM
fied using GMM with 99% accuracy. Similarly, Ali et al40
HMM is used to model the spectral variability of each
have proposed a method for pathological voice detection
speech sound using a mixture Gaussian distribution.33−35
and pathology classification using GMM classifier for run-
HMM acts as a stochastic finite state machine, which is
ning speech voice dataset from MEEI database. They have
assumed to be built from a finite set of possible states (hid-
computed auditory spectrum and all-pole model-based
den during evaluation) and each of those states are associ-
cepstral coefficients features based on the set of three psy-
ated with a mixture of Gaussian probability density
chophysics conditions of hearing (critical band spectral esti-
function.36
mation, equal loudness haring curve, and the intensity
The study in Gavidia-Ceballos and Hansen37 has proposed
loudness power of law of hearing). The features are visual-
a method for vocal fold cancer detection using HMM. They
ized and analyzed using GMM classifier to differentiate
have discussed the computation of new features characteriz-
between normal and pathological voices and also further
ing pathology, termed enhanced spectral-pathology compo-
classified into types like adductor, keratosis, vocal nodules,
nent, which does not require the estimation of glottal flow
vocal fold polyp, and paralysis. The highest accuracy for
waveform as in other works in the literature. An iterative
pathology detection and pathology classification with audi-
procedure-based estimation−maximization is used to fit the
tory spectrum is 93.33% (adductor), respectively. Accuracy
features into a stochastic model, HMM is evaluated on the
for the same with all-pole model-based cepstral coefficients
speech recordings from healthy and vocal fold cancer
is 99.56% and 89.47%, respectively. The important observa-
patients of sustained vowel sounds. The highest accuracy for
tions made in their work are as follows. Even though the lit-
healthy and pathological voices is 92.8% and 88.7%, respec-
erature works have demonstrated a very good accuracy for
tively, with finer spectral representation of enhanced spec-
database with sustained vowels, the amount of work carried
tral-pathology component. Similarly, HMM is used by
on running speech is less and also it is a challenging task.
Arias-Londo~ no et al38 for feature space transformation of
They have stressed on the importance of research work with
MFCC and short-term noise parameter features. The main
continuous speech database as it is more realistic due to the
drawbacks of conventional feature space transformation
usage in conversations in daily life. The processing of con-
(such as PCA and MDA) are inconsistencies in feature
tinuous speech requires voice activity detection (VAD)
extraction and classification stages, and these methods do
activity, which is most challenging task and hence, results in
not take into account the temporal dependencies among the
low performance. The authors have proposed features that
observations, considering the observations independent. To
can be computed without VAD. Based on the literature sur-
overcome this, new technique is proposed in which transfor-
vey, they have also stated that MFCC is a good feature for
mation and classification stage can be obtained simulta-
pathology detection but not in the case of pathology classifi-
neously and the parameters of the model can be adjusted
cation. Overall, they have claimed to propose a method that
with a criterion that minimizes the classification error. The
outperforms the performance of existing running speech
technique is applied on MEEI database with an accuracy of
database experiments without the need of VAD. They also
96.61%.
claim that the features are visually appealing and could be
used by clinicians directly for assessment, making the use of
GMM GMM classifier as optional.
GMM is used to model the probability distribution of fea- Ali et al41 have proposed a system for voice disorder
ture values, which are continuous in nature. The voice detection using GMM, where they detect the voice disorder
acoustic measures are continuous. Hence, it is suitable to by determining the source signal from the speech through
model the voice characteristics. GMM parameters are the linear prediction analysis. The spectrum computed using
estimated during training using expectation−maximiza- the features obtained from LP analysis provides the distri-
tion iterative algorithm or maximum a posteriori algo- bution of energy in both normal and pathological voices,
rithm. which will be used to differentiate them. They have
Muhammad et al39 have used GMM in their paper for observed that lower frequencies from 1 to 1562 Hz contrib-
classification of voice recordings (Arabic Digits) of patients ute significantly in the detection of voice disorders. The sys-
with six types of voice disorders, namely vocal fold cysts, tem is tested with both sustained vowels and running speech
laryngopharyngeal reflux disease, spasmodic dysphonia, and achieved accuracy of 99.94% and 99.75 § 0.8, respec-
sulcus vocalis, vocal fold nodules, and vocal fold polyps. tively.
6
TABLE 1.
Summary of Research Works Related to Voice Disorder Detection and Classification
Sl. No. Author Database Feature Extraction Voice Disorder Type ML Technique Best Accuracy (%)
Technique
1 Gavidia-Ceballos and Sustained vowel phonation Enhanced spectral- Vocal fold cancer HMM 92.8 for healthy and 88.7
Hansen37 pathology component for pathological
2 Martinez and Rufiner83 Not mentioned Cepstrum, mel-cepstrum, Not mentioned ANN 91.30
delta cepstrum and
delta mel-cepstrum, and
Fast Fourier Transform
(FFT)
3 Dibazar et al84 MEEI MFCC and measures of Adductor spasmodic dys- HMM 99.44%
pitch dynamics phonia, A−P squeezing,
paresis, gastric reflux,
hyper-function paraly-
ARTICLE IN PRESS
sis, keratosis/
leukoplakia
4 Hadjitodorov and MEEI Shimmer, jitter, and sev- Not mentioned K Nearest Neigh- 96.1
Mitev69 eral harmonics-to-noise bor (KNN)
ratios
5 Ritchings et al60 Christie and Withington Short-term + long-term Laryngeal cancer MLP 92
hospitals in Manchester parameters
6 Nayak and Bhat85 Kay Elemetrics Corporation Normalized energy Paralysis Multilayer ANN 90
7 Ananthakrishna et al86 MEEI Energy spectrum Adductor, cyst, leukopla- KNN 89.19
kia, vocal fold polyp,
polyp degenerative,
vocal fold edema, vocal
nodule
8 Godino-Llorente and MEEI MFCC Glottis cancer LVQ MLP 96 with LVQ
Gomez-Vilda61
9 Behroozmand and Local database Wavelet packet sub-band Edema, nodules, and ANN SVM 94.12 with Entropy
ARTICLE IN PRESS
19 Shama and Cholayya70 MEEI Harmonics-to-noise ratio Adductor, paralysis , cyst KNN 94.38 with energy
(HNR) measure and the leukoplakia, vocal fold spectrum
critical-band energy polyp, polyp degenera-
spectrum tive, vocal fold edema,
vocal nodule
20 Aguiar Neto et al95 MEEI LPC, Cepastal (CEP), Mel Vocal fold edema, cysts, Vector-quantizing- 98
cepstral (MEL) features nodules, and paralysis trained distance
classifier
21 Gelzinis et al75 University Hospital of Kau- Pitch and amplitude per- Diffuse and nodular KNN SVM 95.5 with SVM
nas University of Medi- turbation measures,
cine, Lithuania cepstral energy fea-
tures, autocorrelation
features, and linear pre-
diction cosine transform
coefficients
22 Linder et al96 ChariteBerlin, Campus Ben- Jitter, shimmer, standard Not mentioned ANN 80
jamin Franklin deviation of fundamen-
tal frequency, and the
glottal-to-noise excita-
tion ratio
23 Murugesapandian et MEEI Mel-scaled wavelet Vocal fold edema, gastric ANN 95.75%
al97 packet transform reflux, and vocal fold
paralysis
24 Salhi et al98 National Hospital “Rabta- Wavelet transform Neural and vocal patholo- Neural networks Between 70% and 100%.
Tunis” gies (Parkinson, Alz-
heimer, laryngeal,
dyslexia)
25 Ghoraani and MEEI Time−frequency Variety of diseases K means 98.6
Krishnan73 distribution
(Continued)
7
8
TABLE 1. (Continued )
Sl. No. Author Database Feature Extraction Voice Disorder Type ML Technique Best Accuracy (%)
Technique
26 Hariharan et al99 MEEI Mel Frequency Band Variety of diseases LDA and KNN 98.48 with LDA
Energy Coefficients 99.59 with KNN
(MFBECs) combined
with singular value
decomposition
27 Kotropoulos and MEEI LPC Vocal fold paralysis in Linear classifier 68.8
Arce100 men and edema in with reject
women option
28 Markaki and MEEI and Normalized modulation - SVM 94.1 with MEEI
Stylianou101 Pr{ncipe de Asturias (PdA) spectral features
29 Markaki and MEEI Modulation spectral Polyp, adductor, Kerato- SVM 94.1 for pathology detec-
Stylianou102 features sis, nodules tion and 88.6 for pathol-
ogy classification
ARTICLE IN PRESS
30 ~ o et al38
Arias-London MEEI MFCC and short-term Organic, neurological, HMM 96.6
cnica de noise parameters
Universidad Polite and traumatic disorders.
Madrid (UPM) UPM contained nod-
ules, polyps, edemas,
carcinomas
31 Das76 National Centre for Voice Variable selection using Parkinson’s disease Neural networks, 92.9% with neural
and Speech, Denver, regression DMneural, network
Colorado regression, and
decision tree
32 Markaki et al103 MEEI and Pr{ncipe de Astu- Combination of modula- Dysphonia, nodules, pol- SVM 96.37% with MEEI
rias (PdA) tion spectral features yps, and edema
and MFCC
33 Arias-Londono et al77 MEEI Nonlinear features Variety of diseases GMM 98.23
+ SVM
34 Arjmandi et al78 MEEI Long-time features Not mentioned SVM 94.26
ARTICLE IN PRESS
cosine transform
coefficients
42 Arjmandi and Pooyan46 MEEI Short-time Fourier Trans- Vocal fold cyst , vocal fold SVM 100% with WPT features
form, continuous wave- nodules, vocal fold
let transform, and edema, and vocal fold
wavelet packet paralysis
transform
43 Muhammad et al39 Arabic digits Multidirectional Regres- Vocal fold cysts, laryngo- GMM 99
sion (MDR)-based pharyngeal reflux dis-
features ease, spasmodic
dysphonia, sulcus
vocalis, vocal fold nod-
ules, and vocal fold
polyps.
44 Tsanas et al107 National Center for Voice Dysphonia features Parkinson’s disease SVM 99
and Speech (NCVS)
database
45 Ali et al108 King Abdul Aziz University MFCC Cyst, polyp, sulcus GMM 91.66
Hospital, Riyadh, Saudi vocalis, nodules, and
Arabia paralysis
46 Alsulaiman et al109 King Saud University, Relative Spectral trans- Cyst, GERD, Polyp, and Multiclass SVM 100
Riyadh form perceptual linear sulcus
predictive
(Continued)
9
10
TABLE 1. (Continued )
Sl. No. Author Database Feature Extraction Voice Disorder Type ML Technique Best Accuracy (%)
Technique
47 Belalcazar-Bolanos et Universidad de Antioquia in Harmonics-to-noise ratio Parkinson’s disease KNN 66.57
al110 Medellın, Colombia (HNR), normalized noise
energy (NNE), cepstral
HNR (CHNR), and glot-
tal-to-noise excitation
ratio (GNE)
48 Holi111 J.S.S. Hospital, Mysore MFCC Parkinson’s disease, cere- Multilayer neural 92
bellar demyelination, network
stroke and senile
disease
49 Kaleem et al72 MEEI (continuous speech) Empirical mode Not mentioned Linear classifier 95.7
decomposition
50 Majidnezhad and Belarusian Republican Cen- Wavelet packet decompo- Various diseases ANN 91.54
ARTICLE IN PRESS
Kheidorov112 tre of Speech, Voice and sition and MFCC
Hearing Pathologies
51 Saldanha et al113 Kay Elemetrics Corporation MFCC Not mentioned LDA 93.14
52 Vikram and Umarani114 Local database (AIISH Wavelet-based MFCC Parkinson’s disease, Hybrid classifier 96.61
Mysore) vocal fold paralysis, (GMM-universal
nodules, edema background
model and SVM)
53 Akbari and Arjmandi64 MEEI Wavelet-packet-based A−P squeezing, gastric Multilayer neural 96.67 with energy fea-
energy and entropy reflux , and network tures 97.33% with
hyperfunction entropy features
54 Al-nasheri et al11 MEEI SVD Peak and lag features Not mentioned GMM 97 with MEEI
55 Cordeiro et al115 MEEI LPC of 30th order Nodules, edema, paraly- Decision tree 94.2
sis, polyps, keratosis
56 Fezari et al116 German database MFCC with energy Spasmodic dysphonia GMM 79.92 for pathological and
developed by Putzer features 81.90 for normal
ARTICLE IN PRESS
62 Ali et al40 MEEI APS and APCC Variety of voice disorders GMM 99.56
63 Al-nasheri et al11 AVPD MDVP parameters Cyst, nodules, paralysis, SVM 81.33
polyp, and sulcus
64 Cordeiro et al119 MEEI Mel-line spectrum Edema, Nodules, GMM 77.9
frequencies Paralysis
65 pez-de-Ipina et al120
Lo Linear parameters Alzheimer’s disease KNN MLP 87.30
90.9
66 Orozco- Arroyave et Local Stability measures spec- Parkinson’s disease, Gaussian Kernel 81−99
al50 tral−cepstral features laryngeal pathologies, SVM 95−99
cleft lip and plate
67 Rani and Holi121 J.S.S. Hospital, Mysore MFCC Parkinson’s disease, cere- GMM 89.5
bellar demyelination,
and stroke
68 Saidi and Almasganj49 MEEI M-band wavelet Vocal fold paralysis, vocal SVM with a linear 100
fold paresis, nodules, kernel
polyps, and edema
69 Salehi122 MEEI Parameters of lifting Paralysis, Paresis, edema, SVM 98.30
scheme nodules, polyp, and
keratosis
70 squez-Correa et al123 Universidad de Antioquia,
Va Voiced and unvoiced Parkinson’s disease SVM 64−86 (voiced)
in Medellın, Colombia features 78−99 (unvoiced)
71 Agarwal et al124 UCI machine learning Six types of dysphonia Parkinson’s disease ELM 90.76
repository parameters including
frequency (jitter), pulse,
amplitude (shimmer),
voicing, harmonicity,
and pitch parameters
72 Aicha and Ezzine125 SVD Glottal flow parameters Larynx cancer ANN 96.9
(Continued)
11
12
TABLE 1. (Continued )
Sl. No. Author Database Feature Extraction Voice Disorder Type ML Technique Best Accuracy (%)
Technique
73 Ali et al53 King Abdul Aziz University Voice contour Cysts, laryngopharyngeal SVM 100
reflux disease, unilateral
vocal fold paralysis, pol-
yps, and sulcus vocalis
74 Ali et al54 MEEI MDVP parameters with Not mentioned SVM 94.71
fractal dimension
75 Benba et al51 Local Database MFCC Parkinson’s disease SVM 100
76 Benba et al52 Local Database MFCC, perceptual linear Parkinson’s disease and Linear SVM 90
prediction, and ReAli- other neurological
tive SpecTrAl PLP diseases
77 Chen et al65 UCI machine learning Feature reduction using Parkinson’s disease ELM 95.97
repository mRMR, Information
gain, relief
ARTICLE IN PRESS
78 Francis et al126 MEEI and local dataset Modified Mellin trans- Polyps, cyst, vocal fold ANN 96.48
form of log spectrum cancer, presbylarynx,, 95.92
nodules and laryngo
pharyngeal reflux
disorder
79 Hammami et al127 Svd Pitch and three first Acute laryngitis, adductor SVM 86%
formants spasmodic, vocal
fatigue, vocal tremor,
vocal fold edema, laryn-
geal paralysis
80 Hemmerling et al71 SVD Fundamental frequency, Hyperfunctional dyspho- Random Forest 100
jitter and shimmer coef- nia, vocal cord paresis
ficients, energy, 0-, 1-, 2- and other pathologies
, 3-order moment, kurto- mentioned in the
sis, power factor, 1-, 2-, database
ARTICLE IN PRESS
noma and laryngecto-
mized speech
86 Ali et al41 MEEI LPC Variety of voice disorders GMM 99.94
(sustained vowel) 99.75
§ 0.8
(continuous
speech)
87 Al-nasheri et al8 SVD MDVP Cysts, paralysis, polyps ṢVM 99.68%, 88.21%, and
MEEI 72.53% for the three
AVPD mentioned databases,
respectively
88 Al-nasheri et al56 MEEI Peak and lag values are Cysts, paralysis, polyps ṢVM 99.255, 98.941 , and
SVD extracted using correla- 95.188 for the three
AVPD tion functions mentioned databases,
respectively
89 Amami and Smiti58 MEEI Density-based spatial Paralysis, keratosis, vocal SVM with RBF 98
clustering of applica- polyp, adductor kernel
tions with noise
(DBSCAN) clustering
technique to detect
noise
90 Benba et al131 Local database Features extracted from Parkinson’s disease SVM 88% with KNN
the time, frequency, and KNN
cepstral domains
91 Cordeiro et al57 MEEI MFCC, line spectral Vocal fold nodule, edema, Heirarchical 98.7
frequencies paralysis classifier
(Continued)
13
14
TABLE 1. (Continued )
Sl. No. Author Database Feature Extraction Voice Disorder Type ML Technique Best Accuracy (%)
Technique
92 Dahmani and Guerti132 SVD MFCC, jitter, shimmer, Spasmodic dysphonia NBN 90
ARTICLE IN PRESS
and fundamental and paralysis diseases
frequency
93 Muhammad et al133 MEEI, SVD, and AVPD Interlaced derivative Vocal folds cysts, unilat- ṢVM 88.5% for detection and
pattern eral vocal fold paralysis, 90.3% for classification
and vocal fold polyp
94 Shia and Jayasree134 SVD Discrete wavelet Not mentioned Feed forward neu- 93.3
transform ral network
95 Teixeira et al67 SVD Jitter, shimmer, and har- Not mentioned ANN 100 for female voices and
monic-to-noise ratio 90 for male voices
96 Harar et al135 SVD - Not mentioned Deep neural 71.36
networks
97 Holi81 - MFCC fused with time Neurological voice Hybrid classifier 94.3
domain parameters disorder (GM
M and SVM)
98 Al-nasheri et al59 MEEI Frequency bands Vocal fold cysts, unilat- SVM 99.54
than other types of voice recordings. The proposed method using FDR technique. A t test is performed to find out any sig-
has achieved the 100% accuracy with MLP kernel of SVM. nificant differences in means of normal and pathological sam-
Similarly, in Benba et al,52 they have discriminated between ples. They have observed a clear difference in the performance
PD and other neurological disorders by extracting MFCC, of MDVP parameters using three databases. The highly ranked
perceptual linear prediction, and ReAlitive SpecTrAl PLP parameters also changed from one database to another and the
from voice samples. They obtained 90% classification accu- best accuracies obtained are 99.68%, 88.21%, and 72.53% for
racy using the first 11 coefficients of the PLP and linear the SVD, MEEI, and AVPD, respectively. Al-nasheri et al56
SVM kernels. Ali et al53 have proposed a novel method have performed a voice pathology detection and classification
based on the voice intensity of a speech signal of continuous on different frequency regions using correlation functions and
speech using SVM classifier. A voice contour is formed by SVM classifier. The peak and their corresponding lag values are
determining the peaks from the speech signals. The area estimated using correlation functions. These features are experi-
under the voice contour is used to discriminate between nor- mented on different frequency bands to investigate the contribu-
mal and disordered subjects. Generally, the area under the tion of each band on the automatic detection and classification
voice contour of disordered voices is lower than that of nor- process. It is reported that the frequency bands between 100
mal voices. The advantage of the proposed feature is that it and 8000 Hz contributes more in detection and classification of
does not require the estimation of fundamental frequency. voice disorders. The accuracies of detection and classification
They have used the database obtained from the King Abdul varied from one database to another. The best detection rate
Aziz University, which consists of voice recordings of vocal obtained for MEEI, SVD, and AVPD databases are 99.809%,
folds cysts, laryngopharyngeal reflux disease, vocal folds 90.979%, and 91.168% and that of classification rates are
polyps, unilateral vocal folds paralysis, and sulcus vocalist 99.255%, 98.941%, and 95.188%, respectively. Cordeiro et al57
patients as well as normal person with a classification accu- have designed three-level hierarchical SVM classifier using
racy of 100%. Ali et al54 proposed a multiband approach Gaussian kernel that is trained in a way that first level SVM
for the detection of voice disorders suing SVM. This multi- classify physiological larynx pathologies and healthy voices, sec-
band approach is based on a three-level DWT and in each ond classifier was used to compare neuromuscular larynx
band, fractal dimension (FD) of the power spectrum is esti- pathologies and healthy voices, whereas third model was used
mated. Experiments are done by appending MDVP param- to compare physiological larynx pathologies and neuromuscu-
eters with FD. They have used MEEI database, which lar larynx pathologies. Both sustained vowel and continuous
contains 173 pathological and 53 normal voices. Since 5 out speech database are used as input to extract MFCC and line
of 173 pathological voices did not contain MDVP parame- spectral frequencies features. The hierarchical classifier has
ters, they used only 168 pathological voices. The results are achieved an accuracy of 98.7% for voice pathology identifica-
expressed in term of sensitivity (SEN), specificity (SPE), tion, whereas 81.3% for classification of two classes of
accuracy (ACC), and the area under the curve. They unhealthy voices.
observed an improvement of 2.26% in accuracy and 1.45% Amami and Smiti58 have proposed incremental Density-
in area under the curve from their previous work by com- based spatial clustering of applications with noise
bining FD of all levels with 22 MDVP parameters. (DBSCAN)-SVM in order to detect noises, analyze, and
Muhammad et al55 developed an automatic voice pathology classify pathological voice from normal voice. The advan-
detection system based on voice production theory, extracting tages of using DBSCAN algorithm are: it can discover clus-
five measurements of the irregularity, namely average, variance, ters of arbitrary shape, can distinguish noise, uses spatial
and the ratio of the successive tubes, skewness, and kurtosis access methods, and is efficient even for large spatial data-
from the vocal tract area. The features extracted from the vocal bases. They used SVM with radial basis function kernel for
tract area are connected to the glottis. Hence, the features classification and evaluated the method on MEEI database
extracted from the pathological voice samples exhibit irregular with an accuracy of 98%. It is concluded that the proposed
patterns over frames in case of sustained vowel, which contrib- method has the ability to handle incremental and dynamic
utes in discrimination of normal and pathological voices. The voices database, which changes over time. Al-nasheri et
authors have concluded that supraglottic tract has more contri- al,59 in their paper, concentrated on developing an accurate
butions than the remaining part of the vocal tract and the vari- and robust feature extraction for detecting and classifying
ance of vocal tract tubes across the utterance is more important voice pathologies by investigating different frequency bands
than the mean of those tubes. The extracted features are classi- using autocorrelation and entropy. Maximum peak values
fied using SVM and evaluated on MEEI and SVD databases. and their corresponding lag values have been extracted
The proposed method has achieved an accuracy of 99.22 § from each frame of a voiced signal by using autocorrelation,
0.01 for MEEI database and 94.7 § 0.21 for SVD. Al-nasheri and are used as features. The experiments are carried on
et al8 investigated the MDVP parameters in automatic detection MEEI, SVD, and AVPD databases with SVM classifier.
of voice pathologies from multiple databases using SVM. They They have also performed U tests to investigate the differ-
have used three databases such as AVPD, MEEI, and SVD ence between the means of the normal and pathological
and considered three common voice disorders (cyst, paralysis, samples. The variations in the accuracies of both detection
and polyp) from these databases. The MDVP parameters and classification based on frequency band and database is
extracted using a computerized speech lab program are ranked reported. It is mentioned that the most contributive bands
ARTICLE IN PRESS
Sarika Hegde, et al A Survey on Machine Learning Approaches for Automatic Detection 17
in both detection and classification were between 1000 and time domain features are used to classify vocal fold paralysis
8000 Hz. The highest accuracies obtained in case of detec- and edema using probabilistic neural network (PNN). The
tion were 99.69%, 92.79%, and 99.79% for MEEI, SVD, proposed method achieved classification accuracy of 90%.
and AVPD, respectively, and in case of classification were Akbari and Arjmandi64 have proposed a method using dis-
99.54%, 99.53%, and 96.02% for MEEI, SVD, and AVPD, crete WPT as feature extraction technique, multiclass LDA
respectively. as dimensionality reduction technique and multilayer neural
network as classifier. The proposed system showed average
classification accuracy of 96.67% and 97.33% for the struc-
ANN ture composed of wavelet packet-based energy and entropy
ANN algorithm works based on the concepts of biological features, respectively.
neural networks in human. It consists of set of neurons con- Chen et al65 explored the ability of extreme learning
nected to each other and the output of one neuron can be machine (ELM) and kernel ELM in early diagnosis of PD
fed as input to another neuron. The neurons are arranged in from University of California, Irvine ML repository. ELM
layers, ie, input layer, hidden layer, and output layer also is a type of ANN, used for single-hidden layer feed-forward
referred as multilayer perceptron (MLP) where the number neural networks, which was initially proposed by Huang et
of neurons in each layer and the total numbers of layers al.66 It chooses the hidden nodes randomly and tends to pro-
depend on the type of applications. vide good generalization performance at extremely fast
Ritchings et al60 have proposed a system for assessing the learning speed. The dataset used in Chen et al65 contained
voice quality of patient after laryngeal cancer therapy using some MDVP parameters like two measures of ratio of noise
MLP. The quality assessment is ranked into seven levels to tonal components in the voice, two nonlinear dynamical
from 0 (least abnormal) to 6 (most abnormal). The database complexity measures, signal fractal scaling exponent, and
is created with the help of the clinicians by capturing both three nonlinear measures of fundamental frequency varia-
EGG and acoustic data channels synchronously at 20 kHz tion. They have observed an improvement in the perfor-
for up to 3 seconds while the subject phonated the vowel /i/ mance by reducing the feature space using techniques like
as steadily as possible. These recordings are processed to minimum redundancy maximum relevance, Information
compute long term features like mean of fundamental fre- Gain (IG), relief and t test feature selection techniques for
quency, the standard deviation of fundamental frequency, ranking the features. The experiments are conducted on var-
the percentage of voiced data and short-term features ious subsets of features. The performance of ELM mainly
related to the spectral envelope of the first few glottal har- depends on the hidden neurons and activation function,
monics, and the glottal noise. MLP with two layers (20−40 whereas the performance of KELM depends on the con-
neurons) and seven outputs is designed to train and test the stant parameter C and kernel parameter g. The authors
dataset and the highest accuracy reported is 92% with com- have experimented these factors to get the optimal results
bined long-term and short-term features. Godino-Llorente up to 95.97% average accuracy. Teixeira et al67 evaluated
and Gomez-Vilda61 have done experiments to detect the the performance of different features like jitter, shimmer,
laryngeal pathological voices (polyps, nodules, cysts, sulcus, and HNR in assessment of voice disorders using ANN.
edemas, carcinoma, etc.) on MEEI database using two neu- Both male and female voice samples are considered from
ral network classification approaches, namely MLP and SVD database, an accuracy of 100% for female voices and
learning vector quantization with MFCC features. They 90% for male voices is achieved.
have reported an outperformance of learning vector quanti-
zation method over MLP with 96% classification accuracy.
Based on the literature survey and experimental results, K-nearest neighbor classifier
they have stressed that MFCC is a good parameterization K-nearest neighbor (KNN) is one of the simplest ML algo-
technique for detecting the voice diseases compared to other rithms, also called as lazy learner. For a given test data, it uses
feature extraction techniques used in the literature. distance measure to decide the KNNs in the train data. The class
An MLP with 64 input nodes and 1 output node is labels of the KNNs are analyzed to make the final decision.68
designed and evaluated in Crovato and Schuck62 for classifi- KNN algorithm is used in Hadjitodorov and Mitev69 for
cation of voices into five pathological groups (chronic laryn- pathology detection. Acoustic parameters like shimmer, jit-
gitis, degenerative, incorrect mobility, organic alterations, ter, several HNRs along with new parameters for estimation
and organic growth) and one normal group from the data- of the turbulent noise in voice signals (Turbulent Noise
base created at PUCRS Hospital. Wavelet packet coefficients Index) and for the “breathy” voice characterization (Nor-
are used for feature computation and the global success rate malized First Harmonic Energy) are used as features. The
in the range 87.5% to 100% is reported. Hariharan et al63 classification accuracy up to 96.1% is achieved for MEEI
proposed a new feature extraction technique to extract the database. Similarly, KNN algorithm is used by Shama and
time domain features. The main objective of developing this Cholayya70 for detection of laryngeal pathologies like
feature extraction technique is that it does not require the adductor, paralysis, cyst, leukoplakia, vocal fold polyp,
computation of fundamental frequency, which is very diffi- polyp degenerative, vocal fold edema, and vocal nodules.
cult to estimate from the pathological voice samples. These Here, they used HNR measure and the critical-band energy
ARTICLE IN PRESS
18 Journal of Voice, Vol. &&, No. &&, 2018
spectrum as features. HNRs are estimated at four different classification of pathological voices by computing fea-
frequency bands and used as one set of features. The nor- tures using adaptive time−frequency distribution and
malized energies are obtained by filtering the voiced speech non-negative matrix factorization applied on MEEI data-
signals using 21 critical bandpass filters, which will mimic base. In this paper, they have mentioned the shortcom-
the human auditory neurons and used as another set of fea- ings of the feature extraction approaches, which require
tures. The set HNR features achieved 94.28% accuracy, the signal to be segmented into short frames as the patho-
whereas critical energy spectrum features achieved 92.38% logical voices are nonstationary and segmenting it at non-
accuracy with KNN classifier. The results obtained have stationary part may lead to loss of important
shown that these features can be used as a tool to supple- information. To overcome those limitations, they pro-
ment the perceptual evaluation of speech for the detection posed a new approach to extract the TF features from the
of suspected laryngeal pathologies. This method requires speech so that it can capture the dynamic changes in the
only shorter length of speech data for the analysis and com- pathological speech. The extracted features are classified
putationally less expensive as compared to the extraction of using K-mean clustering technique with 98.6% classifica-
fundamental frequency and noise measures. tion accuracy.
92.9% is achieved using neural network. Arias-Londono pathological considered in the database. The 12 LPC and 13
et al77 have highlighted that even though MFCC is a pop- MFCC along with its first and second derivatives are used as
ular feature, it is not able to characterize the nonlinear features. The obtained results have shown that GMM is well
process involved in voice production. Hence, they pro- suited for voice pathology classification (95.74%) and HMM
posed a new approach for speech feature extraction using for pathological voice detection (94.44%).
nonlinear analysis of time series for automatic classifica- Kohler et al15 have used glottal flow parameters of the
tion of normal and pathological voices. The set of 11 vocal folds to classify vocal fold nodules and vocal fold
complexity measures are combined with MFCC (com- paralysis. The parameters of the glottal signals can be
bined with noise parameters) features for better represen- obtained through inverse filtering method. Time domain
tation of voice data. A fusion of generative approach and frequency domain parameters are extracted from the
based on GMM, and discriminative approach based on glottal signals and used in various experiments. They have
SVM is used for classification. The features are fed into used a database composed of voice recordings of patients
two different GMM for training. The testing is done with with nodules, and paralysis as well as normal voices of both
GMM at one level and later fed into SVM for second men and women. This database was obtained from a speech
level of classification. This approach is applied on MEEI therapist in Rio de Janeiro. The authors have used ANN,
database with a classification accuracy of 98.23%. Arj- HMM, and SVM for classification. From the obtained
mandi et al78 performed classification experiment on ran- results, it is concluded that glottal signal parameters are
domly selected 50 normal and 50 pathological voice more relevant to discriminate pathologies of the vocal folds
samples from MEEI database. They extracted 33 acoustic than MFCC. The highest accuracy achieved is 97.2% using
parameters and used only 22 parameters as some of them SVM. Holi81 have developed a hybrid classifier for detec-
did not reflect voice quality. They investigated two fea- tion of neurological disorders. The acoustic characteristics
ture reduction techniques like PCA and LDA and feature of a voice signals with neurological disorders may change
selection techniques like individual, forward, backward, due to incomplete closure of glottis. These changes may
and branch-and-bound. To evaluate the feature vectors, help in detecting the certain neurological diseases. The
authors have used six classifiers such as quadratic dis- wavelet transform and MFCCs are fused with the time-
criminant classifier, nearest mean classifier, Parzen classi- domain features and fed into a hybrid model designed using
fier, KNN, support vector classifier, and neural network. GMM and SVM. LP-coded MFCCs computed for selected
After evaluating the different combinations of feature six-level DWT are given to GMM. The output of which is
reduction/feature selection methods and classifiers, they combined with SVM scores obtained with time-domain fea-
have proposed a combined scheme of LDA and SVM tures as input and given to another SVM, which makes the
with recognition rate of 94.26%. Among feature selection decision to classify the data as normal or neurological-disor-
methods, the individual technique has achieved best accu- dered voice. The proposed system achieved classification
racy of 91.55% with SVM. accuracy of 94.3%.
Hariharan et al79 proposed a new hybrid intelligent system,
which includes feature preprocessing techniques, feature reduc-
tion techniques, and classifiers to classify PD patients form DISCUSSIONS
healthy people. They have used PD dataset from University of Various voice disorders, databases, feature extraction, and
California, Irvine ML database. Model-based clustering ML techniques used in AVDD research so far have been dis-
(GMM) is used as feature preprocessing technique, whereas cussed in this paper. It is observed that SVM classifier is used
PCA, LDA, sequential forward selection, and sequential back- most widely compared to any other ML techniques. It may be
ward selection are used as feature reduction/selection techni- due to the fact that SVM classifier performs better for high-
ques. Three classifiers, namely, least-square SVM, PNN, and dimensional data as well as small sized dataset and it is quite
general regression neural network are used to evaluate the per- difficult to get large database of pathological voices for the
formance of features. This hybrid system achieved 100% accu- research. In contrast, ANN requires a large dataset for train-
racy with combination of GMM based feature weighting, ing the model to give better classification accuracy.
LDA/sequential backward selection, and least-square SVM/ Most of the innovations or novel approaches proposed
PNN/general regression neural network. Jothilakshmi80 has in the literature are mainly focusing on the voice charac-
used database created with the help of Rajah Muthiah Medical terization methods, which is referred to it as feature
College Hospital, Annamalai University, which consists of extraction technique. Many authors have claimed to pro-
voice samples of patients with cerebral palsy, dysarthria, hear- pose new method for feature extraction and have reported
ing impairments, laryngectom, mental retardation, left-side an improvement in performance. The feature extraction
paralysis, quadriparesis, stammering, stroke, and tumour in step can be skipped by using deep learning techniques,
vocal tract. In this work, the author has developed a two-level which extracts features inherently but needs large training
model using HMM and GMM to detect the type of voice examples. Even though Deep Neural Network (DNNs)
pathology. Initially, the voice sample will be classified as either have emerged as a powerful ML technique, there are very
normal or pathological and in second level, it will be classified few works done using deep learning, which may be due to
into one of the voice disorder listed above if the voice is the nonavailability of large databases.
ARTICLE IN PRESS
20 Journal of Voice, Vol. &&, No. &&, 2018
24. Watts CR, Clark R, Early S. Acoustic measures of phonatory discriminant analysis and support vector machine. Biomed Signal Pro-
improvement secondary to treatment by oral corticosteroids in a pro- cess Control. 2012;7:3–19.
fessional singer: a case report. J Voice. 2001;15:115–121. 47. Uloza V, Verikas A, Bacauskiene M, et al. Categorizing normal and
25. Guido RC, Pereira JC, Fonseca E, et al. Trying different wavelets on pathological voices: automated and perceptual categorization. J
the search for voice disorders sorting. In: Proceedings of the Thirty- Voice. 2011;25:700–708.
Seventh Southeastern Symposium on System Theory, 2005 (SSST'05). 48. Muhammad G, Melhem M. Pathological voice detection and binary
IEEE; 2005:495–499. classification using MPEG-7 audio features. Biomed Signal Process
26. Umapathy K, Krishnan S, Parsa V, et al. Discrimination of patholog- Control. 2014;11:1–9.
ical voices using a time-frequency approach. IEEE Trans Biomed 49. Saidi P, Almasganj F. Voice disorder signal classification using m-
Eng. 2005;52:421–430. band wavelets and support vector machine. Circuits Syst Signal Pro-
27. Zhang Y, Jiang JJ, Biazzo L, et al. Perturbation and nonlinear cess. 2015;34:2727–2738.
dynamic analyses of voices from patients with unilateral laryngeal 50. Orozco-Arroyave JR, Belalcazar-Bolanos EA, Arias-Londo~ no JD,
paralysis. J Voice. 2005;19:519–528. et al. Characterization methods for the detection of multiple voice dis-
28. Neto BGA, Fechine JM, Costa SC, et al. Feature estimation for vocal orders: neurological, functional, and laryngeal diseases. IEEE J
fold edema detection using short-term cepstral analysis. In: Proceed- Biomed Health Inf. 2015;19:1820–1828.
ings of the 7th IEEE International Conference on Bioinformatics and 51. Benba A, Jilbab A, Hammouch A. Analysis of multiple types of voice
Bioengineering, 2007 (BIBE 2007). IEEE; 2007:1158–1162. recordings in cepstral domain using MFCC for discriminating
29. Gomez-Vilda P, Fernandez-Baillo R, Nieto A, et al. Evaluation of between patients with Parkinson's disease and healthy people. Int J
voice pathology based on the estimation of vocal fold biomechanical Speech Technol. 2016;19:449–456.
parameters. J Voice. 2007;21:450–476. 52. Benba A, Jilbab A, Hammouch A. Discriminating between patients
30. Gomez-Vilda P, Fernandez-Baillo R, Rodellar-Biarge V, et al. Glot- with Parkinson's and neurological diseases using cepstral analysis.
tal source biometrical signature for voice pathology detection. Speech IEEE Trans Neural Syst Rehabil Eng. 2016;24:1100–1108.
Commun. 2009;51:759–781. 53. Ali Z, Alsulaiman M, Elamvazuthi I, et al. Voice pathology detection
31. Zhang Y, Jiang JJ. Acoustic analyses of sustained and running voices based on the modified voice contour and SVM. Biol Inspired Cognit
from patients with laryngeal pathologies. J Voice. 2008;22:1–9. Archit. 2016;15:10–18.
32. Fontes AI, Souza PT, Neto AD, et al. Classification system of patho- 54. Ali Z, Elamvazuthi I, Alsulaiman M, et al. Detection of voice pathol-
logical voices using correntropy. Math Probl Eng. 2014;2014:1–7. ogy using fractal dimension in a multiresolution analysis of normal
33. Levinson SE, Rabiner LR, Sondhi MM. An introduction to the appli- and disordered speech signals. J Med Syst. 2016;40:20.
cation of the theory of probabilistic functions of a Markov process to 55. Muhammad G, Altuwaijri G, Alsulaiman M, et al. Automatic voice
automatic speech recognition. Bell Syst Tech J. 1983;62:1035–1074. pathology detection and classification using vocal tract area irregular-
34. Poritz AB. Hidden Markov models: a guided tour. In: Proceedings of ity. Biocybern Biomed Eng. 2016;36:309–317.
IEEE International Conference on Acoustics, Speech, and Signal Proc- 56. Al-nasheri A, Muhammad G, Alsulaiman M, et al. Investigation of
essing (ICASSP). 7–13. voice pathology detection and classification on different frequency
35. Rabiner LR, Juang BH. Speech recognition: statistical methods. regions using correlation functions. J Voice. 2017;31:3–15.
Encyclopedia of Language & Linguistics2nd ed. 1–18. 57. Cordeiro H, Fonseca J, Guimar~aes I, et al. Hierarchical classification
36. Rao PVS. VOICE: an integrated speech recognition synthesis system and system combination for automatically identifying physiological
for the Hindi language. Speech Commun. 1993;13:197–205. and neuromuscular laryngeal pathologies. J Voice. 2017;31. 384.e9
37. Gavidia-Ceballos L, Hansen JH. Direct speech feature estimation −384.e14.
using an iterative EM algorithm for vocal fold pathology detection. 58. Amami R, Smiti A. An incremental method combining density clus-
IEEE Trans Biomed Eng. 1996;43:373–383. tering and support vector machines for voice pathology detection.
38. Arias-Londo~ no JD, Godino-Llorente JI, Saenz-Lech on N, et al. An Comput Electr Eng. 2017;57:257–265.
improved method for voice pathology detection by means of a 59. Al-Nasheri A, Muhammad G, Alsulaiman M, et al. Voice pathology
HMM-based feature space transformation. Pattern Recognit. detection and classification using auto- correlation and entropy fea-
2010;43:3100–3112. tures in different frequency regions. IEEE Access. 2018;6:6961–6974.
39. Muhammad G, Mesallam TA, Malki KH, et al. Multidirectional 60. Ritchings RT, McGillion M, Moore CJ. Pathological voice quality
regression (MDR)-based features for automatic voice disorder detec- assessment using artificial neural networks. Med Eng Phys.
tion. J Voice. 2012;26. 817.e19−817.e27. 2002;24:561–564.
40. Ali Z, Elamvazuthi I, Alsulaiman M, et al. Automatic voice pathol- 61. Godino-Llorente JI, Gomez-Vilda P. Automatic detection of voice
ogy detection with running speech by using estimation of auditory impairments by means of short-term cepstral parameters and neural
spectrum and cepstral coefficients based on the all-pole model. J network based detectors. IEEE Trans Biomed Eng. 2004;51:380–384.
Voice. 2015;30(6), 757−e7. 62. Crovato CDP, Schuck A. The use of wavelet packet transform and
41. Ali Z, Muhammad G, Alhamid MF. An automatic health monitoring artificial neural networks in analysis and classification of dysphonic
system for patients suffering from voice complications in smart cities. voices. IEEE Trans Biomed Eng. 2007;54:1898–1900.
IEEE Access. 2017;5:3900–3908. 63. Hariharan M, Paulraj MP, Yaacob S. Detection of vocal fold paraly-
42. Vapnik V. The Nature of Statistical Learning Theory. Springer Sci- sis and edema using time-domain features and probabilistic neural
ence & Business Media; 2013. network. Int J Biomed Eng Technol. 2011;6:46–57.
43. Behroozmand R, Almasganj F. Optimal selection of wavelet-packet- 64. Akbari A, Arjmandi MK. An efficient voice pathology classification
based features using genetic algorithm in pathological assessment of scheme based on applying multi-layer linear discriminant analysis to
patients’ speech signal with unilateral vocal fold paralysis. Comput wavelet packet-based features. Biomed Signal Process Control.
Biol Med. 2007;37:474–485. 2014;10:209–223.
44. Markaki M, Stylianou Y. Voice pathology detection and discrimina- 65. Chen HL, Wang G, Ma C, et al. An efficient hybrid kernel extreme
tion based on modulation spectral features. IEEE Trans Audio Speech learning machine approach for early diagnosis of Parkinson ׳s disease.
Lang Process. 2011;19:1938–1948. Neurocomputing. 2016;184:131–144.
45. Saeedi NE, Almasganj F, Torabinejad F. Support vector wavelet 66. Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory
adaptation for pathological voice assessment. Comput Biol Med. and applications. Neurocomputing. 2006;70:489–501.
2011;41:822–828. 67. Teixeira JP, Fernandes PO, Alves N. Vocal acoustic analysis—classi-
46. Arjmandi MK, Pooyan M. An optimum algorithm in pathological fication of dysphonic voices with artificial neural networks. Proc
voice quality assessment using wavelet-packet-based features, linear Comput Sci. 2017;121:19–26.
ARTICLE IN PRESS
22 Journal of Voice, Vol. &&, No. &&, 2018
68. Alpaydin E. Introduction to Machine Learning. MIT Press; 2014. 91. Moran RJ, Reilly RB, de Chazal P, et al. Telephony-based voice
69. Hadjitodorov S, Mitev P. A computer system for acoustic analysis of pathology assessment using automated speech analysis. IEEE Trans
pathological voices and laryngeal diseases screening. Med Eng Phys. Biomed Eng. 2006;53:468–477.
2002;24:419–429. 92. Schlotthauer G, Torres ME, Jackson-Menaldi MC. Automatic diag-
70. Shama K, Cholayya NU. Study of harmonics-to-noise ratio and crit- nosis of pathological voices. WSEAS Trans Signal Process.
ical-band energy spectrum of speech as acoustic indicators of laryn- 2006;2:1260–1267.
geal and voice pathology. EURASIP J Appl Signal Process. 93. Fonseca ES, Guido RC, Scalassara PR, et al. Wavelet time- frequency
2007;2007:50. analysis and least squares support vector machines for the identifica-
71. Hemmerling D, Skalski A, Gajda J. Voice data mining for laryngeal tion of voice disorders. Comput Biol Med. 2007;37:571–578.
pathology assessment. Comput Biol Med. 2016;69:270–276. 94. Kukharchik P, Martynov D, Kheidorov I, et al. Vocal fold pathology
72. Kaleem M, Ghoraani B, Guergachi A, Krishnan S. Pathological detection using modified wavelet-like features and support vector
speech signal analysis and classification using empirical mode decom- machines. 15th European on Signal Processing Conference. IEEE;
position. Med Biol Eng Comput. 2013;51:811–821. 2007:2214–2218.
73. Ghoraani B, Krishnan S. A joint time-frequency and matrix decom- 95. Aguiar Neto BG, Costa SC, Fechine JM. LPC modelling and cepstral
position feature extraction methodology for pathological voice classi- analysis applied to vocal fold pathology detection. Int J Funct Inf Per-
fication. EURASIP J Adv Signal Process. 2009;2009 928974. sonal Med. 2008;1:156–170.
74. Wozniak M, Gra~ na M, Corchado E. A survey of multiple classifier 96. Linder R, Albers AE, Hess M, et al. Artificial neural network- based
systems as hybrid systems. Inf Fusion. 2014;16:3–17. classification to screen for dysphonia using psychoacoustic scaling of
75. Gelzinis A, Verikas A, Bacauskiene M. Automated speech analysis acoustic voice features. J voice. 2008;22:155–163.
applied to laryngeal disease categorization. Comput Methods Prog 97. Murugesapandian P, Yaacob S, Hariharan M. Feature extraction
Biomed. 2008;91:36–47. based on mel-scaled wavelet packet transform for the diagnosis of
76. Das R. A comparison of multiple classification methods for diagnosis voice disorders. 4th Kuala Lumpur International Conference on Bio-
of Parkinson disease. Expert Syst Appl. 2010;37:1568–1572. medical Engineering. Berlin, Heidelberg: Springer; 2008:790–793.
77. Arias-Londono JD, Godino-Llorente JI, Saenz-Lech on N, et al. 98. Salhi L, Talbi M, Cherif A. Voice disorders identification using
Automatic detection of pathological voices using complexity meas- hybrid approach: wavelet analysis and multilayer neural networks.
ures, noise parameters, and mel-cepstral coefficients. IEEE Trans World Acad Sci Eng Technol. 2008;45:330–339.
Biomed Eng. 2011;58:370–379. 99. Hariharan M, Paulraj MP, Yaacob S. Identification of vocal fold
78. Arjmandi MK, Pooyan M, Mikaili M, et al. Identification of voice pathology basedonmelfrequencyband energy coefficientsand singular
disorders using long-time features and support vector machine with valuedcomposition. 2009 IEEE International Conference on Signal
different feature reduction methods. J Voice. 2011;25:e275–e289. and Image Processing Applications (ICSIPA). IEEE; 2009:514–517.
79. Hariharan M, Polat K, Sindhu R. A new hybrid intelligent system for 100. Kotropoulos C, Arce GR. Linear classifier with reject option for the
accurate detection of Parkinson's disease. Comput Methods Prog detection of vocal fold paralysis and vocal fold edema. EURASIP J
Biomed. 2014;113:904–913. Adv Signal Process. 2009;2009:11.
80. Jothilakshmi S. Automatic system to detect the type of voice pathol- 101. Markaki M, Stylianou Y. Normalized modulation spectral features
ogy. Appl Soft Comput. 2014;21:244–249. for cross-database voice pathology detection. Tenth Annual Confer-
81. Holi MS. Wavelet transform features to hybrid classifier for detection ence of the International Speech Communication Association. .
of neurological-disordered voices. J Clin Eng. 2017;42:89–98. 102. Markaki M, Stylianou Y. Using modulation spectra for voice pathol-
82. Weeks M. Digital Signal Processing Using Matlab and Wavelets. ogy detection and classification. Annual International Conference of
Infinity Science Press LLC; 2006. the IEEE Engineering in Medicine and Biology Society, 2009 (EMBC
83. Martinez CE, Rufiner HL. Acoustic analysis of speech for detection 2009). IEEE; 2009:2514–2517.
of laryngeal pathologies. In: Proceedings of 22nd Annual IEEE Inter- 103. Markaki M, Stylianou Y, Arias-Londo~ no JD, et al. Dysphonia detec-
national Conference on Engineering in Medicine and Biology Society. tion based on modulation spectral features and cepstral coefficients.
3, IEEE; 2000:2369–2372. 2010 IEEE International Conference on Acoustics Speech and Signal
84. Dibazar AA, Narayanan S, Berger TW. Feature analysis for auto- Processing (ICASSP). IEEE; 2010:5162–5165.
matic detection of pathological speech. In: Proceedings of the Second 104. Carvalho RTS, Cavalcante CC, Cortez PC. Wavelet transform and
Joint Engineering in Medicine and Biology, 2002. 24th Annual Confer- artificial neural networks applied to voice disorders identification.
ence and the Annual Fall Meeting of the Biomedical Engineering Soci- 2011 Third World Congress on Nature and Biologically Inspired Com-
ety EMBS/BMES Conference. 1, IEEE; 2002:182–183. puting (NaBIC). IEEE; 2011:371–376.
85. Nayak J, Bhat PS. Identification of voice disorders using speech sam- 105. de Bruijn M, ten Bosch L, Kuik DJ, et al. Artificial neural network
ples. TENCON 2003. Conference on Convergent Technologies for the analysis to assess hypernasality in patients treated for oral or oropha-
Asia-Pacific Region. 3, IEEE; 2003:951–953. ryngeal cancer. Logoped Phoniatr Vocol. 2011;36:168–174.
86. Ananthakrishna T, Shama K, Niranjan UC. k-means nearest neigh- 106. Lee J, Jeong S, Hahn M, et al. An efficient approach using HOS-
bor classifier for voice pathology. In: Proceedings of the IEEE INDI- based parameters in the LPC residual domain to classify breathy and
CON 2004 First India Annual Conference. IEEE; 2004:352–354. rough voices. Biomed Signal Process Control. 2011;6:186–196.
87. Behroozmand R, Almasganj F. Comparison of neural networks and 107. Tsanas A, Little MA, McSharry PE, et al. Novel speech signal proc-
support vector machines applied to optimized features extracted from essing algorithms for high-accuracy classification of Parkinson's dis-
patients' speech signal for classification of vocal fold inflammation. ease. IEEE Trans Biomed Eng. 2012;59:1264–1271.
In: Proceedings of the Fifth IEEE International Symposium on Signal 108. Ali Z, Alsulaiman M, Muhammad G, et al. Vocal fold disorder detec-
Processing and Information Technology. IEEE; 2005:844–849. tion based on continuous speech by using MFCC and GMM. 2013 7th
88. Fonseca ES, Guido RC, Silvestre AC, et al. Discrete wavelet trans- IEEE GCC Conference and Exhibition (GCC). IEEE; 2013:292–297.
form and support vector machine applied to pathological voice signals 109. Alsulaiman M, Muhammad G, Ali Z. Classification of vocal fold dis-
identification. Seventh IEEE International Symposium on Multimedia. eases using RASTA-PLP. In: Proceedings of the International Confer-
IEEE; 2005:5. ence on Bioinformatics & Computational Biology (BIOCOMP). The
89. Godino-Llorente JI, G omez-Vilda P, Saenz-Lechon N, et al. Discrim- Steering Committee of the World Congress in Computer Science,
inative methods for the detection of voice disorders. ISCA Tutorial Computer Engineering and Applied Computing (WorldComp);
and Research Workshop (ITRW) on Non-Linear Speech Processing. . 2013:1.
90. Nayak J, Bhat PS, Acharya R, et al. Classification and analysis of 110. Belalcazar-Bolanos EA, Orozco-Arroyave JR, Arias-Londono JD,
speech abnormalities. ITBM-RBM. 2005;26:319–327. et al. Automatic detection of Parkinson's disease using noise measures
ARTICLE IN PRESS
Sarika Hegde, et al A Survey on Machine Learning Approaches for Automatic Detection 23
of speech. IEEE XVIII Symposium of Image, Signal Processing, and 123. Vasquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, et al. Auto-
Artificial Vision (STSIVA). 1–5. matic detection of Parkinson's disease from continuous speech recorded
111. Holi MS. Automatic detection of neurological disordered voices using in non-controlled noise conditions. In: Proceedings of Sixteenth Annual
mel cepstral coefficients and neural networks. Point-of-Care Health- Conference of the International Speech Communication Association. .
care Technologies (PHT), 2013 IEEE. IEEE; 2013:76–79. 124. Agarwal A, Chandrayan S, Sahu SS. Prediction of Parkinson's disease
112. Majidnezhad, V., & Kheidorov, I. An ANN-based method for detect- using speech signal with extreme learning machine. In: Proceedings of
ing vocal fold pathology. arXiv preprint arXiv:1302.1772;2013. IEEE International Conference on Electrical, Electronics, and Opti-
113. Saldanha JC, Ananthakrishna T, Pinto R. Vocal fold pathology mization Techniques (ICEEOT). 3776–3779.
assessment using PCA and LDA. In: Proceedings of 2013 Interna- 125. Aicha AB, Ezzine K. Cancer larynx detection using glottal flow
tional Conference on Intelligent Systems and Signal Processing parameters and statistical tools. International Symposium on Signal,
(ISSP). IEEE; 2013:140–144. Image, Video and Communications (ISIVC). IEEE; 2016:65–70.
114. Vikram CM, Umarani K. Phoneme independent pathological voice 126. Francis CR, Nair VV, Radhika S. A scale invariant technique for detec-
detection using wavelet based MFCCs, GMM-SVM hybrid classifier. tion of voice disorders using modified Mellin transform. IEEE Interna-
In: Proceedings of 2013 International IEEE Conference on Advances tional Conference on Emerging Technological Trends (ICETT). 1–6.
in Computing, Communications and Informatics (ICACCI). IEEE; 127. Hammami I, Salhi L, Labidi S. Pathological voices detection using
2013:929–934. support vector machine. 2016 2nd International Conference on
115. Cordeiro H, Fonseca J, Meneses C. Spectral envelope and periodic Advanced Technologies for Signal and Image Processing (ATSIP).
component in classification trees for pathological voice diagnostic. IEEE; 2016:662–666.
IEEE 36th Annual International Conference on Engineering in Medi- 128. Shahsavari MK, Rashidi H, Bakhsh HR. Efficient classification of
cine and Biology Society (EMBC). 4607–4610. Parkinson's disease using extreme learning machine and hybrid parti-
116. Fezari M, Amara F, El-Emary IM. Acoustic analysis for detection of cle swarm optimization. IEEE 4th International Conference on Con-
voice disorders using adaptive features and classifiers. In: Proceeeings trol, Instrumentation, and Automation (ICCIA). 148–154.
of 2014th International Conference on Circuits, Systems and Control, 129. Sharma RK, Gupta AK. Processing and analysis of human voice for
Switzerland, February 22−24, 2014. . assessment of Parkinson disease. J Med Imaging Health Inf.
117. El Emary IMM, Fezari M, Amara F. Towards developing a voice 2016;6:63–70.
pathologies detection system. J Commun Technol Electron. 130. Wang Z, Yu P, Yan N, et al. Automatic assessment of pathological
2014;59:1280–1288. voice quality using multidimensional acoustic analysis based on the
118. Novotn
y M, Rusz J, Cmejla R, et al. Automatic evaluation of articu- GRBAS scale. J Signal Process Syst. 2016;82:241–251.
latory disorders in Parkinson's disease. IEEE/ACM Trans Audio 131. Benba A, Jilbab A, Hammouch A. Voice assessments for detecting
Speech Lang Process. 2014;22:1366–1378. patients with neurological diseases using PCA and NPCA. Int J
119. Cordeiro H, Fonseca J, Guimar~aes I, et al. Voice pathologies identifi- Speech Technol. 2017;20:673–683.
cation speech signals, features and classifiers evaluation. Signal Proc- 132. Dahmani M, Guerti M. Vocal folds pathologies classification using
essing: Algorithms, Architectures, Arrangements, and Applications Na€ıve Bayes Networks. 2017 6th International Conference on Systems
(SPA). IEEE; 2015:81–86. and Control (ICSC). IEEE; 2017:426–432.
120. Lopez-de-Ipina K, Sole-Casals J, Eguiraun H, et al. Feature selection 133. Muhammad G, Alsulaiman M, Ali Z, et al. Voice pathology detection
for spontaneous speech analysis to aid in Alzheimer's disease diagnosis: using interlaced derivative pattern on glottal source excitation.
a fractal dimension approach. Comput Speech Lang. 2015;30:43–60. Biomed Signal Process Control. 2017;31:156–164.
121. Rani KU, Holi MS. GMM classifier for identification of neurological 134. Shia SE, Jayasree T. Detection of pathological voices using discrete
disordered voices using MFCC features. IOSR J VLSI Signal Pro- wavelet transform and artificial neural networks. 2017 IEEE Interna-
cess. 2015;4:44–51. tional Conference on Intelligent Techniques in Control, Optimization
122. Salehi P. Using patient's speech signal for vocal ford disorders detec- and Signal Processing (INCOS). IEEE; 2017, March:1–6.
tion based on lifting scheme. 2015 2nd International Conference on 135. Harar P, Alonso-Hernandezy JB, Mekyska J, et al. Voice pathology
Knowledge-Based Engineering and Innovation (KBEI). IEEE; detection using deep learning: a preliminary study. IEEE International
2015:561–568. Conference and Workshop on Bioinspired Intelligence (IWOBI). 1–4.