Sie sind auf Seite 1von 10

A method for calculation of optimum data size and bin size of histogram features

in fault diagnosis of mono-block centrifugal pump

V. Indira
, R. Vasanthakumari
, N.R. Sakthivel
, V. Sugumaran
Department of Mathematics, Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry, India
Department of Mathematics, Kasthurba College for Women, Villianur, Puducherry, India
Department of Mechanical Engineering, Amrita School of Engineering, Ettimadai, Coimbatore, India
Department of Mechanical Engineering, SRM University, Kattankulathur, Kanchepuram Dt., India
a r t i c l e i n f o
Centrifugal pump
Fault diagnosis
Histogram features
Machine learning
Minimum sample size
Number of bins
Power analysis
Vibration signals
a b s t r a c t
Mono-block centrifugal pump plays a key role in various applications. Any deviation in the functions of
centrifugal pump would lead to a monetary loss. Thus, it becomes very essential to avoid the economic
loss due to malfunctioning of centrifugal pump. It is clear that the fault diagnosis and condition monitor-
ing of pumps are important issues that cannot be ignored. Over the past 25 years, much research has been
focused on vibration based techniques. Machine learning approach is one of the most widely used tech-
niques using vibration signals in fault diagnosis. There are set of connected activities involved in machine
learning approach namely, data acquisition, feature extraction, feature selection, and feature classica-
tion. Training and testing the classier are the two important activities in the process of feature classi-
cation. When the histogram features are used as the representative of the vibration signals, a proper
guideline has not been proposed so far to choose number of bins and number of samples required to train
the classier. This paper illustrates a systematic method to choose the number of bins and the minimum
number of samples required to train the classier with statistical stability so as to get best classication
accuracy. In this study, power analysis method was employed to nd the minimum number of samples
required and a decision tree algorithm namely J48 was used to validate the results of power analysis and
to nd the optimum number of bins.
2011 Elsevier Ltd. All rights reserved.
1. Introduction
Mono-block centrifugal pumps play an important role in indus-
tries and are the key elements in food industry, waste water treat-
ment plants, agriculture, oil and gas industry, paper and pulp
industry, etc. The development of various faults in centrifugal
pumps would cause severe problems such as abnormal noise, leak-
age, high vibration, etc. Hence a condition monitoring and fault
diagnosis has become very essential for centrifugal pump mainte-
nance. Much research has been devoted towards the identication
of faults in centrifugal pump. Over the past ten years, vibration
based machine learning approach has drawn a considerable atten-
tion in the eld of fault diagnosis.
In this paper, only vibration signals of good and ve faulty con-
ditions were considered for fault diagnosis of centrifugal pump.
The faults considered in the present study were cavitations, bear-
ing fault, impeller fault, seal fault, bearing and impeller fault to-
gether. The characterization of these signals was achieved by
machine leaning approach. The important two activities in ma-
chine learning approach are training and testing the classier. To
model the centrifugal pump fault diagnosis problem as a machine
learning problem, a large number of vibration signals are required
for each condition of the centrifugal pump considered for study. It
is possible to acquire any number of vibration signals for good cen-
trifugal pump condition; however, it is very difcult to acquire sig-
nals of faulty centrifugal pumps of same type with a specic fault
alone. Actually, the signals of centrifugal pump with specic fault
are to be taken from centrifugal pumps where the fault occurred
naturally during operation. The difculties involved in carrying
out these, forces the fault diagnosis engineer to make a compro-
mise. Taking many vibration signals from one specimen having a
typical intended fault is one level of compromise in practice. For
example, taking required number of vibration signals from a cen-
trifugal pump having seal fault alone. Another level of compromise
is that taking vibration signals from a centrifugal pump, where the
required type of fault is simulated onto it (Sakthivel, Sugumaran, &
Nair Binoy, 2010). To overcome these problems, one should know
what number of samples to be trained to get good classication
accuracy. As will be seen in Section 6, if it is known that the good
0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved.

Corresponding author.
E-mail addresses: (V. Indira), vasunthara1@gmail.
com (R. Vasanthakumari), (N.R. Sakthivel), (V. Sugumaran).
Expert Systems with Applications 38 (2011) 77087717
Contents lists available at ScienceDirect
Expert Systems with Applications
j our nal homepage: www. el sevi er . com/ l ocat e/ eswa
classication accuracy can be obtained by training only three sam-
ples per class, the vibration signals of faulty centrifugal pumps
need not be taken from the centrifugal pumps in which the fault
was simulated. Indeed, the signals could be acquired from the cen-
trifugal pumps in which the fault has occurred naturally and any
results obtained out of these signals would be more practical and
realistic in nature. Hence knowledge about the optimum number
of samples required for building a model or training a classier is
very essential. In such situations, a study on determination of min-
imum sample size is highly desirable. Histogram analysis of vibra-
tion signals yields different parameters, which could be used for
classifying different conditions of the pump. If histogram is plotted
using the amplitude of vibration signals, they look dissimilar for
different classes. One could observe a drastic change in height of
the corresponding bins for different classes, only when the ampli-
tude range was divided into a certain number of bins. In other
words, the separability of different classes would be more in histo-
gram plot for a particular number of bins. Thus, it becomes neces-
sary to nd the right number of bins to be chosen to obtain the best
classication accuracy.
In machine learning, a model built with large sample size would
be robust. During implementation, it becomes necessary to answer
how large should be the sample size to build a robust classier?
As it is difcult to get a large number of samples, then the appro-
priate question to ask is What is the minimum number of samples
that are required to build a classier which has good prediction
accuracy? To answer this question many researchers have taken
different approaches.
Many works on minimum sample size determination have been
reported in the eld of bioinformatics and other clinical studies, to
name a few, micro array data (Hwang, Schmitt, Stephanopoulos, &
Stephanopoulos, 2002), cDNA arrays (Schena, Shalon, Davis, &
Brown, 1995), transcription level (Lockhart, 1996), etc. Based on
these works, data-driven hypotheses could be developed which
in turn furthers vibration signal analysis research. Unfortunately,
not much of work has been reported for nding optimum number
of samples required for training classiers using vibration signals.
It might be due to the fact that acquiring vibration signals would
be relatively easy compared to that of clinical data and the study
of minimum sample size may look insignicant. Also, an appropri-
ate guideline was not proposed so far to choose minimum sample
size of vibration signals for fault diagnosis using machine learning
approach. Hence, one has to resort to some thumb rules which lack
mathematical reasoning or blindly follow some previous work as
basis for xing the sample size. This is the fundamental motivation
for taking up this study. There are number of ways available for
determination of sample size viz. for tests of continuous variables
(Day & Graham, 1991; Fleiss, 1981; Pearson & Hartley, 1970), for
tests of proportions (Casagrande, Pike, & Smith, 1978; Feigl,
1978; Gordon & Watson, 1996; Haseman, 1978; Lakatos & Lan,
1992; Lemeshow, Hosmer, & Klar, 1988; Lubin & Gail, 1990;
ONeill, 1984; Roebruck & Kuhn, 1995; Thomas & Conlon, 1992;
Whitehead, 1993), for time-to-event (survival) data (Hanley &
McNeil, 1982; Schoenfeld & Richter, 1982), for receiver operating
curve (ROC) analysis (Obuchowski, 1994; Obuchowski & McClish,
1997; Whittemore, 1981), for logistic and Poisson regression (Bull,
1993; Flack & Eudey, 1993; Hsieh, 1989; Lui & Cumberland, 1992;
Signorini, 1991), repeated measurements (Greenland, 1988; Lipsitz
& Fitzmaurice, 1994), precision (Beal, 1989; Buderer, 1996; Du-
pont, 1988; Samuels & Lu, 1992; Satten & Kupper, 1990; Streiner,
1994), paired samples (Donner & Eliasziw, 1992; Lachenbruch,
1992; Lachin, 1992; Lu & Bean, 1995; Nam, 1992; Nam, 1997; Par-
ker & Bregman, 1986; Royston, 1993), measurement of agreement
(Birkett & Day, 1994), and power (Faul, Erdfelder, Lang, & Buchner,
2007). Studies were also carried out to discuss issues surrounding
estimating variance, sample size re-estimation based on interim
data (Browne, 1995; Gould, 1995; Shih & Zhao, 1997), studies with
planned interim analyses (Geller & Pocock, 1987; Kim & DeMets,
1992; Lewis, 1993; OBrien & Fleming, 1979; Pocock, 1977; White-
head, 1992), and ethical issues (Lantos, 1993). However, there are
certain issues to be addressed in implementation of such tech-
niques to have better statistical stability.
In machine learning approach, the vibration signals are typically
subjected to analyses such as hypothesis testing, classication
(Sugumaran, Sabareesh, & Ramachandran, 2008), regression and
clustering that rely on statistical parameters to draw conclusions
(Alfayez, Mba, & Dyson, 2005; guo-hua, yong-zhong, yu, & huang,
2007; Kavuri & Venkatasubramanian, 1993b; Konga & chen,
2004; Rengaswamy & Venkatasubramanian, 2000; Vaidyanathan
& Venkatasubramanian, 1992; wang & hu, 2006; Wang & McFad-
den, 1993a, 1993b; widodo & yang, 2007). However, these param-
eters could not be reliably estimated with only a small number of
vibration signals. Since the statistical stability of conclusions lar-
gely depends on the accuracy of parameters used, a certain mini-
mum number of vibration signals are required to ensure
condence in the sample distribution and accuracy of parameter
The objective of this paper is to determine the minimum num-
ber of samples required to separate the classes with statistical sta-
bility using F-test based statistical power analysis. The
methodology is illustrated with the help of a typical centrifugal
pump fault diagnosis case study problem with six conditions (clas-
ses). The minimum sample size and the optimum number of bins
to be used are also determined using an entropy based algorithm
called J48. The results of power analysis are compared with that
of J48 algorithm and sample size guidelines are presented for cen-
trifugal pump fault diagnosis at the conclusion section.
2. Experimental studies
A motor with 2HP speed was used to drive the pump. Acceler-
ometer was used along with data acquisition system for acquiring
data. A piezoelectric accelerometer and its accessories form the
core equipment for vibration measurement and recording. The
vibration signals were acquired from the mono-block centrifugal
pump working under normal condition (Good) and with ve differ-
ent faulty conditions considered for the study at a constant speed
of 2880 rpm. The specication of mono-block centrifugal pump is
shown in Table 1. The sampling frequency used in the study was
24,000 Hz and each signal (Sample) has a length of 1024 data
points. For each condition of the centrifugal pump, 250 samples
were taken. A randomly selected signal for each condition is given
in Fig. 1.
3. Histogram features
A difference in the range of amplitude for different classes could
be viewed when the magnitude of the signals were measured in
time domain. Variation in the vibration amplitude could be shown
by using one of the best methods namely histogram plot. This his-
togram plot provides some valuable information for classication
and this information would serve as features for fault diagnosis
of centrifugal pump.
Table 1
Mono-block centrifugal pump specication.
Speed: 280 rpm Pump size: 50 mm 50 mm
Current:11.5 A Discharge: 392 litre per second
Head: 20 m Power: 2HP
V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717 7709
3.1. Selection of number of bins
There are two important things to be considered on selection of
bins namely, range of bins and the width of bins. The steps in-
volved in choosing bin range are as follows.
(i) Find the minimum amplitude of each signal in all the six
(ii) Calculate the minimum of the values obtained in step (i).
(iii) Find the maximum amplitude of each signal in all the six
(iv) Calculate the maximum of the values obtained in step (iii).
Thus, the bin range should be from minimum of minimum
amplitude (0.72944) to maximum of maximum amplitude
(1.036422) of all the six classes.
As discussed in Section 1, it is required to choose the number of
bins for a fault diagnosis problem. This number has been obtained
by carrying out a chain of experiments using a decision tree algo-
rithm namely (J48) with different number of bins. At rst, the
range of bin was divided into two equal parts. That is to say, the
number of bins used was two. The two histogram features, namely,
and f
were extracted and the corresponding classication accu-
racy was also obtained by using J48 algorithm. The method and
procedure of performing the same using J48 algorithm is explained
in Section 5. A set of similar experiments were carried out with dif-
ferent number of bins from 3, 4, . . . , 50 and the corresponding re-
sults are shown in Fig. 2. Upon careful observation of the results
of the experiments and the corresponding graph (Ref Fig. 2), the
best classication accuracy 100% was obtained when number of
bins was 37. In the present study, the number of bins was chosen
as 37 with a bin width of 0.047726. It is also noted from the graph
that the oscillation in classication accuracy reduced drastically
and tends to stabilize if the number of bins was greater than 14.
Hence one could also choose number of bins greater than 14. In
this study, the best classication accuracy criterion was used.
3.2. Feature extraction and feature selection
In this paper, the bin size chosen was 37. A set of thirty-seven
measures f
, f
, . . . , f
were extracted from the vibration signals and
they are called histogram features. For further study, instead of
using vibration signals directly, the histogram features extracted
from vibration signals were used. The less contributing features
were eliminated from the feature set to reduce the dimension of
the feature set. Only the relevant features are considered for fur-
ther study. The process of selecting relevant features is known as
feature selection and it was carried out by using J48 algorithm.
The theory, methodology and implementation of J48 algorithm
are discussed in detail in Sugumaran, Muralidharan, and Rama-
chandran (2007). The best features in the present study were found
to be f
, f
, f
, f
and f
. These ve features were used as a rep-
resentative of the signals for further study. The selected set of ve
histogramfeatures along with their class labels formthe data set to
determine the minimum sample size using power analysis.
4. Determination of minimum sample size
Determining the minimum sample size of vibration signals to
build a classier is an important issue as discussed earlier and is
one of the rst things to be considered while attempting the clas-
sication of machine conditions through vibration signals. The
method of performing power analysis is discussed in Section 4.1.
The results of power analysis were veried with the help of a func-
Fig. 1. Vibration signals of the pump for good and faulty conditions.
0 10 20 30 40 50 60
Number of Bins


Fig. 2. Number of bins as a function of classication accuracy.
7710 V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717
tional test using J48 algorithm. As J48 algorithm could be used as a
classier with 10-fold cross validation method, the number of sam-
ples is decreased from 250 per class to three per class. The results
are presented and discussed in Section 6.
4.1. Power analysis
Performing power analysis and sample size estimation is an
important aspect of experimental design, because without these
calculations, sample size may be too high or too low. If sample size
is too low, the experiment will lack the precision to provide reliable
answers to the questions that are investigated. If sample size is too
large, time and resources will be wasted, often for minimal gain.
Power analysis has been used in many applications (Cohen, 1988;
Kraemer & Thiemann, 1987; Mace, 1974; Pillai & Mijares, 1959).
It is based on two measures of statistical reliability in the hypothe-
sis test, namely the condence interval (1 a) and power (1 b).
The test compares null hypothesis (H
) against the alternative
hypothesis (H
). The null hypothesis is dened that the means of
the classes are the same whereas alternative hypothesis is dened
that the means of classes are not same while the condence level of
a test is the probability of accepting null hypothesis, the power of a
test is the probability of accepting the alternative hypothesis
(Hwang et al., 2002). Alternatively false positives, a (Type I error)
is the probability of accepting alternative hypothesis while false
negatives b(Type II error) is the probability of accepting the null
hypothesis. The estimation of sample size in power analysis is done
such that the condence and the power (statistical reliability mea-
sures) in hypothesis test could reach predened values. Typical
analyses might require the condence of 95% and the power of 95%.
The condence level and the power are calculated from the dis-
tributions of the null hypothesis and alternative hypothesis. Den-
ing these distributions depends on the statistical measures being
used in the hypothesis test. For a two class problem, the method
and procedure of computing hypothesis test using t-distribution
is explained in Hwang et al. (2002). In case of multi-class problem
(number of classes greater than two), instead of t-statistic, the F-
statistic measure derived from Pillais V formula (Cohen, 1969; Ol-
son, 1974) is used for the estimation of sample size. Pillais V is the
trace of the matrix dened by the ratio of between-group variance
(B) to total variance (T). It is a statistical measure often used in
multivariate analysis of variance (MANOVA) (Cohen, 1969). The
Pillais V trace is given by
V = trace(BT
) =
; (1)
where k
is the ith eigenvalue of W
B in which W is the within-
group variance and h is the number of factors being considered in
MANOVA, dened by h = c1 and c is the number of classes. A high
Pillais V means a high amount of separation between the samples
of classes, with the between-group variance being relatively large
compared to the total variance. The hypothesis test can be designed
as follows using F statistic transformed from Pillais V.
: l
= l
= l
= l
; H
: There exists i; j such that l
0; (2)
: F =
1 (V=s) ( )=[s(N c p s)[
- F(ph; s(N c p s)); (3)
: F =
1 (V=s) ( )=[s(N c p s)[
- F[ph; s(N c p s); D = sD
N[ (4)
with D
, where p and c are the number of variables and the
number of classes, respectively. s is dened by min (p, h). By using
these dened distributions of H
and H
, the condence level and
the power could be calculated for a given sample size and effect si-
ze.The method used for two-class problem is used here to estimate
the minimum sample size for statistical stability whereby the sam-
ple size is increased until the calculated power reaches the prede-
ned threshold of 1 b. However, here is a limitation that the
value of p cannot be larger than Nc + s = N 1. This analysis might
produce a misleading sample size estimate when the real data set
was not consistent with the assumption (normality and equal vari-
ance) underlying the statistic used in power analysis. To check the
effect of possible violations of the assumptions on the estimated
sample size, the actual power and mean differences between classes
are compared to the predened values. The actual values in both
cases were sufciently large that we need not be worried about
the impact of data which does not perfectly match the normality
or equal variance assumptions.
5. J48 algorithm
Fault diagnosis can be viewed as a data mining problem where
one extracts information from the acquired data through a classi-
cation process. A predictive model for classication invokes the
idea of branches and trees identied through a logical process.
The classication is done through a decision tree with its leaves
representing the different conditions of the pumps. The sequential
branching process ending up with the leaves here is based on con-
ditional probabilities associated with individual features. Any good
classier should have the following properties.
(1) It should have good predictive accuracy; it is the ability of
the model to correctly predict the class label of new or pre-
viously unseen data.
(2) It should have good speed.
(3) The computational cost involved in generating and using the
model should be as low as possible.
(4) It should be robust; Robustness is the ability of the model to
make correct predictions given the noisy data or data with
missing values.
(5) The level of understanding and insight that is provided by
classication model should be high enough. It is reported
that C4.5 model introduced by J.R. Quinlan satises with
the above criteria and hence the same is used in the present
study. Decision tree algorithm (C4.5) has two phases: build-
ing and pruning. The building phase is also called as grow-
ing phase. Both these are briey discussed here.
5.1. Building phase
Training sample set with discrete-valued attributes is recur-
sively partitioned until all the records in a partition have the same
class; this forms the building phase. The tree has a single root node
for the entire training set. For every partition, a new node is added
to the decision tree. For a set of samples in a partition S, a test attri-
bute X is selected for further partitioning the set into S
, S
, S
, . . . , S
New nodes for S
, S
, S
, . . . , S
are created and these are added to the
decision tree as children of the node for S. Further, the node for S is
labeled with test X, and partitions S
, S
, S
, . . . , S
are recursively
partitioned. A partition in which all the records have identical class
label is not partitioned further, and the leaf corresponding to it is
labeled with the corresponding class. The construction of decision
tree depends very much on how a test attribute X is selected. C4.5
uses information entropy evaluation function as the selection cri-
V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717 7711
teria. The entropy evaluation function is arrived at through the fol-
lowing steps.
Step 1: Calculate Info(S) to identify the class in the training sets S
Info(S) =
frer C
; S=[S[ ( ) [ [log
freq C
; S=[S[ ( ) [ [ ; (5)
where, [S[ is the number of cases in the training set. C
a class,
i = 1, 2, . . . , K. K is the number of classes and freq(C
, S) is the number
of cases included in C
Step 2: Calculate the expected information value, Info
(S) for test
X to partition S.
InfoX(S) =
[=[S[ ( )Info(S
) [ [; (6)
where L is the number of outputs for test X, S
is a subset of S corre-
sponding to ith output and is the number of cases of subset S
Step 3: Calculate the information gain after partition according to
test X.
Gain(X) = Info(S) InfoX(S): (7)
Step 4: Calculate the partition information value Split Info(X)
acquiring for S partitioned into L subsets.
Split Info(X) =
[ [
S [ [
[ [
S [ [

[ [
S [ [

Step 5: Calculate the gain ratio of Gain(X) over Split Info(X).
Gain Ratio(X) = Gain(X) Split Info(X): (9)
The Gain Ratio (X) compensates for the weak point of Gain (X) which
represents the quantity of information provided by X in the training
set. Therefore, an attribute with the highest Gain Ratio (X) is taken
as the root of the decision tree.
5.2. Pruning phase
Usually a training set in the sample space leads to a decision
tree which may be too large to be an accurate model; this is due
to over-training or over-tting. Such a fully grown decision tree
needs to be pruned by removing the less reliable branches to ob-
tain better classication performance over the whole instance
space even though it may have a higher error over the training
set. The C4.5 algorithm uses an error-based post-pruning strategy
to deal with over-training problem. For each classication node
C4.5 calculates a kind of predicted error rate based on the total
aggregate of misclassications at that particular node. The error-
based pruning technique essentially reduces to the replacement
of vast sub-trees in the classication structure by singleton nodes
or simple branch collections if these actions contribute to a drop in
the overall error rate of the root node.
5.3. Application of decision tree for the problem under study
As is customary the samples are divided into two parts: training
set and testing set. Training set is used to train classier and testing
set is used to test the validity of the classier. Ten-fold cross-vali-
dation is employed to evaluate classication accuracy. The training
process of C4.5 using the samples with continuous-valued attri-
butes is as follows.
(1) The tree starts as a single node representing the training
(2) If the samples are all of the same class, then the node
becomes a leaf and is labeled with the class.
(3) Otherwise, the algorithm discretises every attribute to select
the optimal threshold and uses the entropy-based measure
called information gain (discussed in Section 5.1) as heuris-
tic for selecting the attribute that will best separate the sam-
ples into individual classes.
(4) A branch is created for each best discrete interval of the test
attribute, and the samples are partitioned accordingly.
(5) The algorithm uses the same process recursively to form a
decision tree for the samples at each partition.
(6) The recursive partitioning stops only when one of the fol-
lowing conditions is true:
(a) All the samples for a given node belong to the same class
(b) There are no remaining attributes on which the samples
may be further partitioned.
(c) There are no samples for the branch test attribute. In this
case, a leaf is created with the majority class in samples.
(7) A pessimistic error pruning method (discussed in Section
5.2) is used to prune the grown tree to improve its robust-
ness and accuracy.
6. Results and discussion
In this paper, power analysis was applied to a multi-class prob-
lem. The null hypothesis (H
) was that the means of the classes are
the same whereas alternative hypothesis (H
) was dened as the
means of classes are not same. As a basis for the problem, a data
set consisting of 250 samples from each class was considered.
The data set was tested for normality, homogeneity and indepen-
dence. The expected power level was set to 95% (it is equivalent
to a = 0.05 and b = 0.05). The F-test was performed for calculation
of sample size for given effect size, a error probability, power,
number of groups, repetitions. The test was priori sample size com-
putation of multivariate analysis of variance (MANOVA) with re-
peated measures and within-between interactions. The central
and non-central distributions with critical F value of the result
are shown in Fig. 3. From the data set, Pillais V was computed
and its value was found to be 1.486501. Pillais V is an indicator
of the statistical stability of the data set. If the value is low, then
the statistical stability would be less and more number of data
points would be required to train the classier and vice versa. At
the rst sight, one could say from Pillais V that the statistical sta-
bility of the data set under study is reasonably good. Any value
above 0.5 would get reasonably good statistical stability; hence
needing less number of data points. Upon calculation, the actual
number of data points required to have good statistical stability
was found to be 18 for six classes (for all six classes together). It
has been reduced to 3 per class with required power level of
0.95. To sum up, at least three data points per class should be taken
for training the classier to retain 95% of the power level in the sig-
nal. At this point, it is quite natural to ask, how many data points
would be required if 90% of power level is sufcient and the appli-
cation can afford to bear 10% of a error probability? In order to an-
swer the series of questions with different type I error (a error
probability) and power level (1 b), a set of experiments were car-
ried out to calculate the minimumsample size required. In one ser-
ies of experiments, the sample size was computed for various a
error probability values with a power level of 95%. This experiment
was repeated for power levels of up to 80% in steps of 5%. The
important results of these experiments were tabulated in Table
2. The resulting curves were drawn in a same graph for easy com-
parison (Fig. 4). The top most curve in Fig. 4 tells the required min-
imum sample size for various a error values at power level of 95%.
From the graph (Refer Fig. 4), one could nd the required sample
7712 V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717
size for a given a error probability and power level. Suppose, one
could sacrice the power level, then the variation of required sam-
ple size for various a error probabilities was computed and plotted
(Refer Fig. 5). The important results of these experiments were tab-
ulated in Table 2. The values in the Table 2 would be very helpful
for those who do not have enough data points to train their classi-
ers. In the sense, when enough data points are not available, then
the next best possible thing is to make some compromise in a error
probability and a same amount of compromise in power level to
get a smaller sample size. One should remember that the statistical
stability would be inversely proportional to the amount of sacrice
in a error probability and power level. However, in the present
study, the sample size required is found to be just 3 per class (Refer
Table 2) for different pair of a error probability and power level.
The sample size is not only a function of a error probability and
power level, but also greatly inuenced by a parameter called ef-
fect size, which is a function of Pillais V. For the data set under
study, the Pillais V was found to be 1.486501 and the correspond-
ing effect size was 0.7690296. To study the inuence of effect size
parameter on sample size, a set of hypothetical experiments were
conducted to compute total sample size as a function of effect size
(0.10.5 with a step of 0.05) for a error probability of 5% to 20%
with a step of 5% with power level of 95%. The resulting plot is
shown Fig. 6. A similar study was carried out to compute total sam-
ple size as a function of effect size (0.1 to 0.5 with a step of 0.05) for
power level of 80% to 95% with a step of 5% with a error probability
of 5%. The resulting plot is shown Fig. 7. As effect size increases the
total sample size decreases for a given a error probability. As a er-
ror probability increases, the required sample size decreases with
increase in effect size (Refer Fig. 6). As effect size increases the total
sample size decreases for a given power level. As power level in-
creases, the required sample size also increases with increases in
Fig. 3. F tests MANOVA: Reported measures, within-between intersection: sample size calculation with effect size f(V) = 0.769296, a error probability = 0.05, power (1 b
error probability) = 0.95, number of groups = 6, repetitions = 5 with Pillais V formula value = 1.486501.
Table 2
Power analysis test results with effect size f(V) = 0.7690296, number of groups = 6, repetitions = 5 with Pillais V formula value = 1.486501, numerator df 20.0000.
Output parameter a = 0.05, 1b = 0.95 a = 0.10, 1b = 0.90 a = 0.15, 1b = 0.85 1b = 0.80, a = 0. 20 a = 0.25, 1 b = 0.75
Noncentrality parameter k 42.581270 35.484342 35.484392 35.484392 35.484392
Critical F 1.793302 1.625806 1.478985 1.371388 1.285001
Denominator df 48 36 36 36 36
Total sample size for 6 classes 18 15 15 15 15
Sample size per classes ~3 ~3 ~3 ~3 ~3
Actual power 0.957877 0.937708 0.963965 0.977564 0.985457
Fig. 4. Total sample size as a function of a error probability.
V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717 7713
effect size (Refer Fig. 7). As a matter of curiosity, one may be inter-
ested in the variation of effect size with respect to error probabil-
ity. For the total sample size of 18 (for all six classes together), with
different power levels (80% to 95% in steps of 5%), the effect size
was plotted as a function of error probability as shown in Fig. 8.
For a given power level, as the error probability increases the effect
size decreases. For a given error probability, as power level in-
creases the effect size also increases. It is to be noted that these ef-
fects are for the xed sample size.
The sample size obtained using power analysis is to be validated
/veried through some established method. Another statistical
method could be chosen which is well established in the eld or
alternatively resort to some functional test that would examine
the effect of sample size on classication accuracy. A pool of clas-
siers is available for researchers from which the suitable classier
has to be chosen. Often the selection would be based on the easi-
ness in training, general classication accuracy, computational
complexity, etc. In the present study, J48 decision tree classier
(A Java implementation of C4.5 algorithm) has been chosen due to
easiness in training the classier. Also, it has the capability to per-
form feature selection simultaneously. As discussed in Section 2,
the data set consists of 1500 samples. From each class 250 samples
were drawn. All 250 samples were considered for training J48 clas-
sier with 10-fold cross-validation method. The classiers job here
consists of two processes: feature selection and feature classica-
tion. Initially, a decision tree was generated from which top ve
features that contribute largely for classication were selected.
The rest of the features were ignored consciously for all further
The next step was to nd out the effect of sample size on clas-
sication accuracy. To study this effect, the sample size was con-
stantly reduced with a decrement of 5 samples and the
Fig. 5. Total sample size as a function of power level for various a error probabilities.
Fig. 6. Total sample size as a function of effect size for various a error probabilities.
7714 V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717
corresponding classication accuracies were noted. The variation
in classication accuracy with respect to the sample size is shown
in Fig. 9. From Fig. 9, one could view the classication accuracy falls
abruptly when the sample size was reduced below 3 per class. This
means, a classier could be trained with three samples per class for
centrifugal pump fault diagnosis problem; however, the objective
of the study is one step further what is the minimum sample size
that will have statistical stability. Upon careful observation, the
oscillation of the classication accuracy tends to stabilize as sam-
ple size increases. In classication, the root mean square error
and mean absolute error as a function of sample size is plotted
and shown in Figs. 10 and 11 respectively.
Fig. 7. Total sample size as a function of effect size for various power levels.
Fig. 8. Effect size as a function of a error probability for various power levels.
Fig. 9. Sample size as a function of classication accuracy. Fig. 10. Sample size as a function of root mean square value.
V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717 7715
The maximum classication accuracy of J48 algorithm for 250
samples per class was found to be 100%. As per power analysis re-
sult, from Table 2, if 5% of a error probability and 95% of power le-
vel could be accommodated, then for given data set the minimum
required sample size was found to be 3. This means, if 3 samples
were used for training the classier, the maximum a error proba-
bility that likely to happen would be 5%. This has to be validated
with J48 classiers classication results. From Fig. 11, it is evident
that the mean absolute error is just about 0% (i.e. it did not exceed
1%) for cases whose sample size was greater than or equal to 3. The
corresponding root mean square error is shown in Fig. 10.
In classication problem, the mean absolute error of the classi-
er is a measure of type I error (a error probability). Type I error is
an error due to misclassication of the classier. In general, type I
error is rejecting a null hypothesis when it is true. a error probabil-
ity is a measure of type I error in hypothesis testing and hence, the
equivalence is obvious. From the above discussion, the results of
power analysis were true and the actual error did not exceed the
upper bound (5%) found in power analysis. A similar exercise of
validating the results at other points also assures the validity of
the power analysis test. Thus, the sample size suggested by power
analysis could be condently used for machine learning approach
to fault diagnosis of centrifugal pump.
7. Conclusion
The machine learning approach to fault diagnosis of centrifugal
pump with six types of faults was considered with an objective of
determining minimum samples required to separate faulty condi-
tions, with statistical stability. In the present study, histogram fea-
tures were extracted from vibration signals. A method to choose
number of bins using J48 algorithm and a method for determina-
tion of minimum sample size using power analysis have been pro-
posed. The results of power analysis were validated using J48
classier. From the results and discussions, it was found that the
number of bins chosen to be 37 and sample size per class should
be more than three so as to get good classication accuracy. The
minimum sample size that will have statistical stability was found
to be three with a power level of 95% and an a error probability of
5%. Also, for other paired values of power level and a error proba-
bility, the required minimum sample size was found to be three
per class (Refer Table 2). Thus, for a centrifugal pump classication
problem, the classier could be trained with three samples in order
to get good classication accuracy with statistical stability. The ef-
fects of a error probability and power level on sample size were
also discussed in detail. These results and graphs would serve as
a guideline for researchers working in the area of fault diagnosis
of centrifugal pump to choose the number of bins and x the sam-
ple size (machine learning approach).
The rst author V. Indira and the third author N.R. Sakthivel
acknowledge Karpagam University for providing them an opportu-
nity to pursue Ph.D (part time), in Karpagam University,
Alfayez, L., Mba, D., & Dyson, G. (2005). The application of acoustic emission for
detecting incipient cavitation and the best efciency point of a 60 kW mono-
block centrifugal pump. NDT and E International, 38, 354358.
Beal, S. L. (1989). Sample size determination for condence intervals on the
population mean and on the difference between two population means.
Biometrics, 45, 969977.
Birkett, M. A., & Day, S. J. (1994). Internal pilot studies for estimating sample size.
Statistics in Medicine, 13, 24552463.
Browne, R. H. (1995). On the use of a pilot sample for sample size determination.
Statistics in Medicine, 14, 19331940.
Buderer, N. M. F. (1996). Statistical methodology: I. Incorporating the prevalence of
disease into the sample size calculation for sensitivity and specicity. Academic
Emergency Medicine, 3, 895900.
Bull, S. B. (1993). Sample size and power determination for a binary outcome and an
ordinal exposure when logistic regression analysis is planned. American Journal
of Epidemiology, 137, 676684.
Casagrande, J. T., Pike, M. C., & Smith, P. G. (1978). An improved approximate
formula for calculating sample sizes for comparing two binomial distributions.
Biometrics, 34, 483486.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York:
Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Erlbaum.
Day, S. J., & Graham, D. F. (1991). Sample size estimation for comparing two or more
treatment groups in clinical trials. Statistics in Medicine, 10, 3343.
Donner, A., & Eliasziw, M. (1992). A goodness-of-t approach to inference
procedures for the kappa statistic: Condence interval construction,
signicance-testing and sample size estimation. Statistics in Medicine, 11,
Dupont, W. D. (1988). Power calculations for matched case-control studies.
Biometrics, 44, 11571168.
Faul, Franz, Erdfelder, Edgar, Lang, Albert-Georg, & Buchner, Axel (2007). G

3: A exible statistical power analysis program for the social, behavioral, and
biomedical sciences. Behavior Research Methods, 39(2), 175191.
Feigl, P. (1978). A graphical aid for determining sample size when comparing two
independent proportions. Biometrics, 34, 111122.
Flack, V. F., & Eudey, T. L. (1993). Sample size determinations using logistic
regression with pilot data. Statistics in Medicine, 12, 10791084.
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York:
Wiley & Sons.
Geller, N. L., & Pocock, S. J. (1987). Interim analyses in randomized clinical trials:
Ramications and guidelines for practitioners. Biometrics, 43, 213223.
Gordon, I., & Watson, R. (1996). The myth of continuity-corrected sample size
formulae. Biometrics, 52, 7176.
Gould, A. L. (1995). Planning and revising the sample size for a trial. Statistics in
Medicine, 14, 10391051.
Greenland, S. (1988). On sample -size and power calculations for studies using
condence intervals. American Journal of Epidemiology, 128, 231237.
guo-hua, Gao, yong-zhong, Zhag, yu, Zhu, & huang, Duan guang- (2007). Hybrid
support vector machines based multi-fault classication. Journal of China
University of Mining and Technology, 17, 246250.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver
operating characteristic (ROC) curve. Radiology, 143, 2936.
Haseman, J. K. (1978). Exact sample sizes for use with the FisherIrwin test for 2 2
tables. Biometrics, 34, 106109.
Hsieh, F. Y. (1989). Sample size tables for logistic regression. Statistics in Medicine, 8,
Hwang, D., Schmitt, W. A., Stephanopoulos, G., & Stephanopoulos, G. (2002).
Determination of sample size and discriminatory expression patterns in
microarray data. Bioinformatics, 18, 11841193.
Kavuri, S. N., & Venkatasubramanian, V. (1993b). Using fuzzy clustering with
ellipsoidal units in neural networks for robust fault classication. Computers
and Chemical Engineering, 17(8), 765784.
Kim, K., & DeMets, D. L. (1992). Sample size determination for group sequential
clinical trials with immediate response. Statistics in Medicine, 11, 13911399.
Konga, Fansen, & chen, Ruheng (2004). A combined method for triplex pump fault
diagnosis based on wavelet transform, fuzzy logic and neural-networks.
Mechanical Systems and Signal Processing, 18, 161168.
Kraemer, H. C., & Thiemann, S. (1987). How many subjects? Newbury Park, CA: Sage.
Fig. 11. Sample size as a function of mean absolute error.
7716 V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717
Lachenbruch, P. A. (1992). On the sample size for studies based upon McNemars
test. Statistics in Medicine, 11, 15211525.
Lachin, J. M. (1992). Power and sample size evaluation for the McNemar test with
application to matched case-control studies. Statistics in Medicine, 11,
Lakatos, E., & Lan, K. K. G. (1992). A comparison of sample size methods for the
Logrank statistic. Statistics in Medicine, 11, 179191.
Lantos, J. D. (1993). Sample size: Profound implications of mundane calculations.
Pediatrics, 91, 155157.
Lemeshow, S., Hosmer, D. W., & Klar, J. (1988). Sample size requirements for studies
estimating odds ratios or relative risks. Statistics in Medicine, 7, 759764.
Lewis, R. J. (1993). An introduction to the use of interim data analyses in clinical
trials. Annals of Emergency Medicine, 22, 14631469.
Lipsitz, S. R., & Fitzmaurice, G. M. (1994). Sample size for repeated measures studies
with binary responses. Statistics in Medicine, 13, 12331239.
Lockhart, D. J. et al. (1996). Expression monitoring by hybridization to high density
oligonucleotide arrays. Nature Biotechnology, 14, 16751680.
Lu, Y., & Bean, J. A. (1995). On the sample size for one-sided equivalence of
sensitivities based upon McNemars test. Statistics in Medicine, 14, 18311839.
Lubin, J. H., & Gail, M. H. (1990). On power and sample size for studying features of
the relative odds of disease. American Journal of Epidemiology, 131, 552566.
Lui, K. J., & Cumberland, W. G. (1992). Sample size requirement for repeated
measurements in continuous data. Statistics in Medicine, 11, 633641.
Mace, A. E. (1974). Sample size determination. Huntington.NY: Krieger.
Nam, J. M. (1992). Sample size determination for case-control studies and the
comparison of stratied and unstratied analyses. Biometrics, 48, 389395.
Nam, J. M. (1997). Establishing equivalence of two treatments and sample size
requirements in matched-pairs design. Biometrics, 53, 14221430.
OBrien, P. C., & Fleming, T. R. (1979). A multiple testing procedure for clinical trials.
Biometrics, 35, 549556.
Obuchowski, N. A. (1994). Computing sample size for receiver operating
characteristic studies. Investigative Radiology, 29, 238243.
Obuchowski, N. A., & McClish, D. K. (1997). Sample size determination for diagnostic
accuracy studies involving binormal ROC curve indices. Statistics in Medicine, 16,
Olson, C. L. (1974). Comparative roubustness of six tests in multivariate analysis of
variance. Journal of American Statistical Association, 69, 894908.
ONeill, R. T. (1984). Sample sizes for estimation of the odds ratio in unmatched
case-control studies. American Journal of Epidemiology, 120, 145153.
Parker, R. A., & Bregman, D. J. (1986). Sample size for individually matched case-
control studies. Biometrics, 42, 919926.
Pearson, E. S., & Hartley, H. O. (1970). Biometrika tables for statisticians (3rd ed.).
Cambridge: Cambridge University Press. Vol. I.
Pillai, K. C. S., & Mijares, T. A. (1959). On the moments of the trace of a matrix and
approximations to its distribution. Annals of Mathematical Statistics, 30,
Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical
trials. Biometrika, 64, 191199.
Rengaswamy, R., & Venkatasubramanian, V. (2000). A fast training neural network
and its updation for incipient fault detection and diagnosis. Computers and
Chemical Engineering, 24(2/7), 431437.
Roebruck, P., & Kuhn, A. (1995). Comparison of tests and sample size formulae for
proving therapeutic equivalence based on the difference of binomial
probabilities. Statistics in Medicine, 14, 15831594.
Royston, P. (1993). Exact conditional and unconditional sample size for pair-
matched studies with binary outcome: A practical guide. Statistics in Medicine,
12, 699712.
Sakthivel, N. R., Sugumaran, V., & Nair Binoy, B. (2010). Application of support
vector machine and proximal support vector machine for fault classication of
mono-block centrifugal pump. International Journal of Data Analyses Techniques
and Strategies, 1, 3861.
Samuels, M. L., & Lu, T. F. C. (1992). Sample size requirement for the back-of-the-
envelope binomial condence interval. American Statistician, 46, 228231.
Satten, G. A., & Kupper, L. L. (1990). Sample size requirements for interval
estimation of the odds ratio. American Journal of Epidemiology, 131, 177184.
Schena, M., Shalon, D., Davis, R. W., & Brown, P. O. (1995). Quantitative monitoring
of gene-expression patterns with a complementary-DNA microarray. Science,
270, 467470.
Schoenfeld, D. A., & Richter, J. R. (1982). Nomograms for calculating the number of
patients needed for a clinical trial with survival as an endpoint. Biometrics, 38,
Shih, W. J., & Zhao, P. L. (1997). Design for sample size re-estimation with interim
data for double blind clinical trials with binary outcomes. Statistics in Medicine,
16, 19131923.
Signorini, D. F. (1991). Sample size for poisson regression. Biometrika, 78, 446450.
Streiner, D. L. (1994). Sample-size formulae for parameter estimation. Perceptual
and Motor Skills, 78, 275284.
Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection
using decision tree and classication through proximal support vector machine
for fault diagnostics of roller bearings. Mechanical System and Signal Processing,
21, 930942.
Sugumaran, V., Sabareesh, G. R., & Ramachandran, K. I. (2008). Fault diagnostics of
roller bearing using kernel based neighborhood score multi-class support
vector machine. Expert Systems With Applications, 34, 30903098.
Thomas, R. G., & Conlon, M. (1992). Sample size determination based on shers
exact test for use in 2 2 comparative trials with low event rates. Controlled
Clinical Trials, 13, 134147.
Vaidyanathan, R., & Venkatasubramanian, V. (1992). Representing and diagnosing
dynamic process data using neural networks. Engineering Applications of
Articial Intelligence, 5(1), 1121.
Wang, Jiangping, & hu, Hongtao (2006). Vibration based fault diagnosis of pump
using fuzzy technic. Measurement, 39, 176185.
Wang, W. J., & McFadden, P. D. (1993a). Early detection of gear failure by vibration
analysis I. Calculation of the timefrequency distribution. Mechanical Systems
and Signal Processing, 7(3), 193203.
Wang, W. J., & McFadden, P. D. (1993b). Early detection of gear failure by vibration
analysis II. Interpretation of the timefrequency distribution using image
processing techniques. Mechanical Systems and Signal Processing, 7(3), 205215.
Whitehead, J. (1992). The design and analysis of sequential clinical trials (2nd ed.). Chi
Chester: Ellis Horwood.
Whitehead, J. (1993). Sample size calculations for ordered categorical data. Statistics
in Medicine, 12, 22572271.
Whittemore, A. S. (1981). Sample size for logistic regression with small response
probability. Journal of the American Statistical Association, 76, 2732.
Widodo, A., & Bo-suk, Y. (2007). Support vector machine in machine condition
monitoring and fault diagnosis. Mechanical System and Signal Processing, 21,
V. Indira et al. / Expert Systems with Applications 38 (2011) 77087717 7717