Sie sind auf Seite 1von 6

2009 5th International Colloquium on Signal Processing & Its Applications (CSPA)

An Investigation into Infant Cry and Apgar Score

Using Principle Component Analysis
Rohilah Sahak, Wahidah Mansor, Lee Yoot Khuan, Azlee Zabidi, Farah Yasmin
Faculty of Electrical Engineering, University Technology Mara,
40450 Shah Alam,

Abstract - In this paper, the cry characteristics of newborn immediate medical care. This scoring system includes 5
infant were investigated and correlated with Apgar score. The components; heart rate, respiratory effort, muscle tone, reflex
Apgar score is a rapid method to evaluate the physical condition irritability and color, which is each of them is given a score of
of newborn infants at 1 and 5 minutes after birth, and may be 0, 1 or 2. The Apgar score is the sum of the 5 components.
repeated later if the score is and remains low. The cry of
Scores 3 and below are regarded as critically low, 4 to 6 fairly
premature and mature infants with low and normal Apgar scores
was analyzed using principle component analysis (PCA). Pre- low, and 7 to 10 generally normal. Premature infants are more
processing of the voice or unvoiced segments of cry signals to have low Apgar scores (i.e 3 and below) than normal
include zero rate crossing, short time energy and filtering. infants [7].
Through principle component analysis, the reduced dimension This paper presents principal component analysis (PCA) of
cry signal is investigated to extract features to be correlated with mel-frequency cepstral coefficients (MFCC) computed from
Apgar scores. This work provides the foundation for the design infant cries with low and high Apgar scores. It reveals the
of an automatic algorithm to replace the manual Apgar scoring differences in the PCA results between the low and high
system. Apgar scores.
The crying of infants is the sign of life after birth and the
A. Mel Frequency Cepstral Coefficient (MFCC)
first tool of communication. It involves characteristics of
MFCC is used to encode signal [8].The steps involved in
vocalizations, facial expressions and limb movements [1]. As
this process are frame blocking, windowing, fast fourier
an adult speech, infant cry used to communicate about their
transform (FFT), mel – frequency wrapping and cepstrum.
needs or problem. Previous researchers state that infant cry
In frame blocking, the signal is blocked into frames. Then,
consists of useful information regarding the physical,
hamming window is applied for each individual frame so that
psychological and pathological state of the infant. It has been
the signal discontinuities at the beginning and end of each
shown that there exist significant differences among the
frame can be minimized. The next step is the Fast Fourier
various types of crying, like healthy infant cry, pain cry and
Transform, which converts each frame from the time domain
pathological cry [2, 3]. Infants at medical risk such as
into the frequency domain. The FFT is a fast algorithm to
premature infants or infants with metabolic disturbances cry at
implement the Discrete Fourier Transform (DFT).
higher frequency than normal. That indicates the infants may
Human perception of the frequency contents of sounds
have problem and need immediate medical treatment.
for speech signals does not follow a linear scale. Thus for
Many researchers analyze cries to relate with disease.
each tone with an actual frequency, a subjective pitch is
They used various analysis techniques such as auditory
measured on a scale called the ‘mel’ scale. The mel-frequency
analysis, time domain analysis, frequency domain,
scale is linear frequency spacing below 1000Hz and a
spectrographic and computer – based analysis. Michelsson et
logarithmic spacing above 1000Hz. As a reference point, the
al. [4] defined healthy and unhealthy cry types by
pitch of a 1kHz tone, 40dB above the perceptual hearing
spectrography. They introduced modified spectrogram to
threshold, is defined as 1000 mels. Therefore, equation (1) can
analyse infant cries with hypothyroidism, asphyxia or
be used to compute the mels for a given frequency, f in Hz [9].
meningitis. Schonweiler et al. [1, 5, 6] investigated the cries of
hearing impaired infants. They found differences in the
mel(f)= 2595*log10(1+f/700). (1)
duration of the cry signals between 3 healthy infants and 4
infants with hearing diseases.
To simulate the subjective spectrum filter bank is used,
The clinical status of newborn infant is assessed at 1 and
one filter for each desired mel frequency component. The
5 minutes after birth using a scoring system known as Apgar
filter bank has a triangular bandpass frequency response, and
score. This score may be repeated later if the score is and
the spacing as well as the bandwidth is determined by a
remains low. The method was designed to help doctors or
constant mel-frequency interval [9]. In the final step, the log
nurses to assess an overall physical condition of newborn so
mel spectrum are convert back into time. The result is called
that they could quickly determine whether the baby needed
the mel frequency cepstrum coefficients (MFCCs). Actually,

978-1-4244-4152-5/09/$25.00 ©2009 IEEE 209

the mel spectrum coefficients can convert to the time domain III. METHODOLOGY
using the Discrete Cosine Transform (DCT) since the
coefficient represents as a real number. The MFCCs can be Two types of infant cries were used in this study,
calculated using (2) [9, 10, 11]. premature cries (with Apgar score of 3 and below) and healthy
cries (with Apgar score of 7 and above). These infant’s cries
were obtained from
1 
cn 
~ ~
 (log S k )[n(k  ) ] (2)
k 1 2 K There are two stages involve in the investigation; pre-
where S is the log energy output of the kth filter, and processing and analysis stages (see Fig. 1). In both stages, the
n=1,2,….M. algorithms were developed using Matlab 7.6.0 (R2008a).

The number of mel cepstrum coefficients, K, is typically

~ Cry Signal
chosen as 20. The first component, c 0
, is excluded from the
DCT since it represents the mean value of the log energy of Filtering Process
the frame which carries little speaker specific information.
Pre – Processing
STFT analysis
B. Principal Component Analysis
Principal component analysis is a variable reduction
procedure. It is useful when there is some redundancy in the Segmentation
some of variables. In this case, redundancy means that some
of the variables are correlated with one another, Because of
this redundancy, it should be possible to reduce the observed
MFCC Analysis
variables into a smaller number of principal components that
will account for most of the variance in the observed variables. Analysis
This technique has three effects; it orthogonalizes the Principal
components of the input vectors, it orders the resulting Component
orthogonal components (Principal Components or PC) so that
those with the largest variation come first, and it eliminates
those components that contribute the least to the variation in End
the data set [2].
The most important thing in this analysis is to eliminate Fig. 1. Overall process for analyzing cries with high and low Apgar scores.
which components or coefficient so that no information lost
will occur. There are three ways to evaluate significant A. Pre – processing
information in the components; eigenvalue – one criterion, In the pre-processing stage, the cry signals were sampled
scree test and cumulative percent of variance. at 44.1 kHz and filtered using bandpass filters. The frequency
In the eigenvalue – one criterion and scree test, an range of the bandpass filters for both cases was different since
eigenvalue which represents the amount of variance that is the frequency of the cries for high and low Apgar scores were
accounted for by a given component, is computed. The different. For high Apgar score, the frequency of bandpass
eigenvalue – one criterion, also known as Kaiser criterion, filter used was 400 – 2080 Hz whereas for low Apgar score, it
[12] states component with an eigenvalue greater than 1 will was 500 – 1550 Hz. Short time Fourier transform (STFT) was
be retained, while the component less than 1 are viewed as a computed to determine the frequency contents of the signals
trivial. The scree test creates a plot to display eigenvalue before and after filtering. After bandpass filtering, the signals
associated with each component. The use of scree test must be were segmented using short time energy and zero crossing
supplemented with additional criteria, [12] such as explained rate. This is done to detect the start and end of each segment.
variances and eigenvalue – one criterion. When researchers
use the cumulative percent of variance as the criterion for B. Analysis
solving the number-of components problem, they usually The segmented signals were then analyzed using MFCC.
retain enough components so that the cumulative percent of Each 1 second segment was first multiplied by a hamming
variance accounted for at least 70% (and sometimes 80%) window having a width of 16 ms and the successive frames
were overlapped by 50%. Then its energy spectrum was
[12]. The percent of variance accounted for can be calculated calculated using fast Fourier transform (FFT). This spectrum
using: was then converted to mel-spectrum by passing it through
triangular filters. The MFCC filter banks used are shown in
Var percentage= 100* Variances of the Coefficient (3)
Figure 2. The bank consists of 40 filters, 25 linearly spaced
Total Variance.
between 0 Hz and 4.7 kHz and 15 log-space between 5 kHz
and 22.5 kHz. Finally, MFCC was computed and 10
coefficients were extracted.

Mel-spaced filterbank 0.6
2 0.4





1.6 -0.4


1.4 -0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (s)


1 5000

Frequency (Hz)

0.8 3000


0.6 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Time (s)

Fig. 4. Cry signal with low Apgar score and its spectrogram after filtering

0 0.5 1 1.5 2 2.5
A. Mel Frequency Cepstrum Coefficient (MFCC)
Frequency (Hz) 4
x 10 The MFCC features obtained from cries with low and
Fig.2. The MFCC Filter Bank used in the analysis high Apgar scores are shown in Fig. 5 and Fig. 6. Note that the
amplitude of the coefficients 2 to 8 for high Apgar score is
To eliminate redundancy in the results of MFCC, PCA was below 10 dB. Only coefficients 9 and 10 have amplitudes
conducted on all 10 coefficients. The eigenvalue – one criterion between 10 to 20 dB. The MFCC features of low Apgar scores
and scree test were performed to determine the eigenvalue for encompasses different pattern compared to high Apgar Scores.
each MFCC. In the eigenvalue – one criterion, MFCC that have High energy (amplitude approaches 20 dB) can be observed
eigenvalue that is greater than one is retained. In the scree test, from the MFCC feature starting from coefficients 6 to 10.
the principal components of the MFCC were selected based on MFCCs

large eigenvalues. Coefficients that have smaller eigenvalues

were removed as their significant in the data decreases. In order 100

to confirm the results obtained from the scree test, the analysis 80

of the cumulative percent of variance was then performed.





A sample of cry signal with low Apgar score and its 0

spectrogram before and after filtering are shown in Fig. 3 and -20

4 respectively. It is obvious that after filtering the frequency -40

components above 1550 Hz have been removed. The 120
fundamental frequency of cries for low Apgar score infant is 80
60 8

577 - 1168 Hz whereas for high Apgar score, it is 500-594 Hz, 40

20 4

which is in agreement with those reported by Wasz et al [13]. Length of Frame
0 1


0.2 Fig. 5. MFCC feature of high Apgar score


0.1 MFCCs



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (s)


Frequency (Hz)




2000 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Time (s)

Fig. 3. Cry signal with low Apgar score and its spectrogram -40
100 10
80 8
60 6
40 5
20 3
0 1
Length of Frame

Fig. 6. MFCC feature of low Apgar score

B. Principal Component Analysis (PCA) the sixth coefficient to the tenth coefficient, there is small
Table 1 and 2 show the eigenvalues for high (normal cry) difference in eigenvalues. In low apgar score, a large
and low (premature cry) Apgar score infant respectively. In difference occurred between coefficient 1 and 8. Note that
high Apgar score, the eigenvalue for coefficient 1 is 3.461 there is a relatively small difference between coefficient 8 and
whereas the eigenvalue for coefficient 2 is 2.465. This 10. These coefficients are regarded as trivial. The differences
variation is consistent with the earlier statement that the first in eigenvalues for both cases can be observed clearly in Fig. 8.
components extracted tend to account for relatively large The scree test results suggest that coefficients 1 to 6 for high
amounts of variance, while the later components account for Apgar score and coefficients 1 to 8 for low Apgar score should
relatively smaller amounts. Table 2 also shows the same be retained.
variation. The total of eigenvalue for both cases is exactly the The Scree Test

same as the total number of coefficient, 10 and the total of 3.5

High Apgar Score
proportion is equal to 1. Low Apgar Score


Coefficient Eigenvalue Difference Proportion Cumulative

1 3.4612 0.9962 0.3461 0.3461
2 2.4650 1.1283 0.2465 0.5926
3 1.3367 0.5039 0.1337 0.7263
4 0.8329 0.1003 0.0833 0.8096 1
5 0.7325 0.3543 0.0733 0.8828
6 0.3783 0.0851 0.0378 0.9207 0.5

7 0.2932 0.0829 0.0293 0.9500

8 0.2103 0.0344 0.0210 0.9710 1 2 3 4 5 6 7 8 9 10
Number of MFCC Coefficient
9 0.1759 0.0619 0.0176 0.9886
10 0.1140 0.0114 1.0000 Fig. 7. The Scree plot of infant cry
Total 10.0000 3.3472 1.0000 8.1877
Difference of Eigenvalue
TABLE 2 High Apgar Score
Low Apgar Score

Coefficient Eigenvalue Difference Proportion Cumulative

1 2.5051 0.1905 0.2505 0.2505 1
2 2.3146 0.8052 0.2315 0.4820
3 1.5094 0.5539 0.1509 0.6329 0.8

4 0.9555 0.1130 0.0956 0.7285

5 0.8425 0.2129 0.0843 0.8127
6 0.6296 0.1117 0.0630 0.8757
7 0.5180 0.1984 0.0518 0.9275
8 0.3195 0.0760 0.0320 0.9594 0.2
9 0.2435 0.0813 0.0244 0.9838
10 0.1622 0.0162 1.0000 0
1 2 3 4 5 6 7 8 9 10
Total 10.0000 2.3429 1.0000 7.6529 Number of MFCC Coefficient

Fig. 8. The difference of eigenvalue

i. The eigenvalue – one criterion
Coefficients 1, 2 and 3 for high and low Apgar scores iii. Cumulative Percent of Variance
(see Table 1 and Table 2) have eigenvalues greater than 1. The cumulative percent of total variance for high and low
Based on one criterion method, these coefficients are Apgar scores are shown in Table 3 and 4 respectively. From
significant and should be retained. the Table 3, the cumulative percent of variance is equal to
some minimal value. For example, components 1, 2, 3, 4, 5, 6
ii. The Scree Test and 7 produce a proportion of total variance of 11.08%,
The plot of eigenvalues against coefficient numbers for 10.29%, 11.10%, 8.88%, 8.39%, 9.58% and 10.71%. Adding
high and low Apgar scores is shown in Figure 7. For both these percentages together results in a cumulative percent of
cases, the eigenvalues decreases as the coefficient number variance of 70%. For low Apgar score, components 1 to 8 are
increases. This shows high frequency components of the retained components since they produce a cumulative total
signals produce similarities in the coefficients. variance of 76.32%. Therefore, for high and low Apgar scores,
Infant with high Apgar score has a relatively large the coefficients 1 to 7 should be retained.
difference in eigenvalues between coefficient 1 and 6. From

Sample PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Bnormal_1 0.1394 -0.4756 0.3213 -0.1877 0.3991 -0.2369 0.0889 0.0714 0.5999 0.1660
Bnormal_2 -0.2450 0.4180 -0.4017 -0.0907 -0.2563 -0.0466 -0.1115 0.0306 0.6673 0.2629
Bnormal_3 0.4299 0.1146 -0.0442 0.2594 0.3146 0.4460 -0.6246 0.0082 0.1835 -0.1055
Bnormal_4 0.3009 0.2583 0.2335 0.6618 -0.0949 -0.0680 0.3974 0.3084 0.0809 0.2775
Bnormal_5 -0.2226 0.2909 -0.3128 0.0147 0.7747 0.1098 0.3866 0.0015 -0.0622 -0.0466
Cnormal_1 0.4509 0.2289 0.0076 -0.0881 -0.0044 -0.1641 0.1729 -0.8163 0.0234 0.1130
Cnormal_2 -0.0163 0.5017 0.3311 -0.1204 0.1818 -0.6283 -0.3410 0.1697 -0.0955 -0.2048
Cnormal_3 -0.4536 0.0150 0.3390 0.1485 0.1498 0.1138 -0.2846 -0.2566 -0.1837 0.6641
Cnormal_4 -0.3957 -0.1819 0.0136 0.5978 -0.0048 -0.1944 -0.0553 -0.3652 0.2361 -0.4719
Cnormal_5 0.1724 -0.3118 -0.5969 0.2220 0.1066 -0.5037 -0.2279 0.0667 -0.2271 0.3075
Explained Variance 0.1108 0.1029 0.1110 0.0888 0.0839 0.0958 0.1071 0.1043 0.0945 0.1008
Proportion of Tot Variance (%) 11.0826 10.2931 11.0978 8.8788 8.3861 9.5842 10.7125 10.4324 9.4500 10.0825
Cumulative of Tot Variance (%) 11.0826 21.3758 32.4735 41.3524 49.7385 59.3227 70.0352 80.4676 89.9175 100.0000

Sample PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Prem2_1 0.0731 -0.5620 0.0651 -0.0915 0.3414 -0.2102 0.1643 -0.1039 0.5176 0.4469
Prem2_2 -0.1594 0.5633 0.1088 -0.1308 -0.2335 0.1500 -0.1557 0.0854 0.3219 0.6451
Prem2_3 0.3810 0.3532 -0.0826 0.1409 -0.2615 -0.3219 0.4834 -0.2904 0.3978 -0.2349
Prem2_4 0.4727 0.0318 -0.0855 0.0781 -0.0273 -0.5279 -0.6837 0.0150 -0.0705 0.1002
Prem2_5 -0.1893 -0.0155 -0.4893 0.6681 0.0958 0.2644 -0.2213 -0.3329 0.1911 0.0533
Prem3_1 -0.3147 0.3517 0.0810 0.1135 0.5259 -0.4578 0.1939 -0.2520 -0.3690 0.1856
Prem3_2 0.0498 0.0379 0.7329 0.0838 0.1122 0.2223 -0.2794 -0.4942 0.1474 -0.2140
Prem3_3 0.5206 -0.0080 -0.1831 -0.2201 0.0444 0.3762 0.1492 -0.4360 -0.4112 0.3493
Prem3_4 -0.3588 -0.0157 -0.3437 -0.6214 -0.1012 -0.1311 -0.2272 -0.4726 0.1401 -0.2112
Prem3_5 -0.2521 -0.3397 0.1871 0.2186 -0.6703 -0.2517 0.0951 -0.2556 -0.2867 0.2628
Explained Variance 0.1106 0.1094 0.1111 0.1105 0.1108 0.1024 0.1085 0.0396 0.1074 0.0899
Proportion of Tot Variance (%) 11.0559 10.9360 11.1110 11.0476 11.0774 10.2353 10.8535 3.9584 10.7395 8.9854
Cumulative of Tot Variance (%) 11.0559 21.9919 33.1030 44.1506 55.2280 65.4633 76.3168 80.2752 91.0146 100.0000


The analysis of cry signals with high and low Apgar [1] Gyorgy Varallyay Jr. “Future Prospects of the Application of the
Infant Cry in the Medicine”, Periodica Polytechnica Ser. El. Eng., vol.
score using MFCC and PCA have been described in this
50, No 1-2, 2006
paper. In the MFCC analysis, 10 coefficients were extracted [2] Orozco J. and Garcia C. A. R., “Detecting Pathologies from Infant Cry
for each 1 second signal. The MFCC feature of low Apgar Applying Scaled Conjugate Gradient Neural Networks”, Proceeding of
score has higher energy than that of high Apgar score. ESANN, 249 – 354, 2003
[3] Boukydis, C. F. Z., “Perception of Infant Crying as an Interpersonal
The PCA results show that the eigenvalues of high Apgar
Event. In: Infant Crying: Theoretical and Research Perspectives”, ed.
score are larger than that of low Apgar score infants. Based on Lester, B. M & Boukydis, C. F. Z Plenum Press, New York, 1985.
the results obtained from the eigenvalue – one criterion, scree [4] Michelsson, K.- Michelsson, O., “Phonation in the Newborn, Infant
test and cumulative percent of variance, it is confirmed that Cry”, Int. J. Pediatr. Otorhinolaryngol., 49/1 pp. S297 – S301 (1999).
[5] Schonweiler, R.- Kaese, S.-Moller, S. Rinscheid, A.-Ptok, M.,
coefficient 2 and 3 should be retained for high and low Apgar
“Neuronal Networks and Self-organizing Maps: New Computer
scores. Even though all methods agree that coefficient 1 Techniques in the Acoustic Evaluation of the Infant Cry” Int. J. Pediatr.
should be retained, based on the MFCC analysis results, this Otorhinolaryngol., 38 pp. 1 – 11 (1996).
coefficient should be ignored since it does not have significant [6] Schonweiler, R.- Kaese, S.-Moller, S. Rinscheid, A.-Ptok, M.,
“Classification of Spectrographic Voice Patterns Using Self-organizing
Neuronal Networks (Kohonen Maps)in the Evaluation of the Infant Cry
The scree test and cumulative percent of variance also with and without Time-delayed Feedback”, Int. J. Pediatr.
suggest that coefficients 4 to 6 should be considered. Otorhinolaryngol., 38/2 pp. 181(1996).
Therefore, further investigation has to be carried out using a [7] MD Michael O. Gardner and MD Robert L. Goldenberg, “Predicting
Low Apgar Scores of Infants Weighing Less Than 1000 grams: The
large number of samples. The results obtained in this study are
Effect of Corticosteroids”, Elsevier, Inc., 1995.
very useful since they provide the foundation for the design of [8] Huang, X., Acero, A., Hon, H. “Spoken Language Processing: A Guide
an automatic algorithm to replace the manual Apgar scoring to Theory, Algorithm, and System Development”, Prentice Hall, Inc.,
system. USA, 2001.
[9] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur
Rahman, “Speaker Identification using Mel Frequency Cepstral
Coefficients”, 3rd International Conference on Electrical & Computer
Engineering ICECE 2004, Dhaka, Bangladesh, 28-30 December 2004.

[10] Jr., J. D., Hansen, J., and Proakis, J. “Discrete-Time Processing of
Speech Signals”, second ed. IEEE Press, New York, 2000.
[11] F. Soong, E. Rosenberg, B. Juang, and L. Rabiner, "A Vector
Quantization Approach to Speaker Recognition", AT&T Technical
Journal, vol. 66, March/April 1987, pp. 14-26.
[12] Http:
[13] O. Wasz-Hockert, J. Lind, V.Vuorenkoski, T. Partanen and E. Valanne,
“The Infant Cry – A Spectrographic and Auditory Analysis”, Spastic
International Medical Publications in Association with William
Heinemann Medical Books Ltd., 1968


Das könnte Ihnen auch gefallen