Beruflich Dokumente
Kultur Dokumente
G.A. Di Lucca , M. Di Penta , S. Gradara
dilucca@unina.it, dipenta@unisannio.it, gradara@unisannio.it
University of Naples ”Federico II”, DIS - Via Claudio 21, I-80125 Naples, Italy
RCOST - Research Centre on Software Technology
University of Sannio, Department of Engineering
Palazzo ex Poste, Via Traiano, I-82100 Benevento, Italy
In this paper we adopted a common similarity measure, The Classification and Regression Trees (CART)
know as the cosine measure, which determines the angle method, introduced by Breiman et al. [4], is a binary recur-
between the ticket vectors and the query vector when they
sive method that yields a class of models called tree-based
are represented in a -dimensional Euclidean space. models. CART is an alternative to linear and additive mod-
els for regression and classification problems. It is inter-
2.3 Support Vector Machine esting due to its capability of handling both continuous and
discrete variables, its inherent ability to model interaction
Support Vector Machine (SVM) is a learning algo- among predictors and its hierarchical structure. CART can
rithm based on the idea of structural risk minimization select those variables and their interactions that are most im-
(SRM) [18] from computational learning theory. The clas- portant in determining an outcome or dependent variable.
sification’s main purpose is to obtain a good generaliza- If the dependent variable is continuous, CART produces
tion performance, that is a minimal error classifying unseen regression trees; if the dependent variable is categorical,
test samples. SRM shows that the generalization error is CART produces classification trees. In both classification
bounded by the sum of the training set error and a term and regression trees, CART’s major goal is to produce an
depending on the Vapnik-Chervonenkis (VC) dimension of accurate set of data classifiers by uncovering the predictive
the learning machine. structure of the problem under consideration.
/
Given a labeled set of training samples . 1 , where 0 1 CART generates binary decision trees: the recursive par-
0 *24365
is the i-th input vector component and is the as- 1 titioning method can be described on a conceptual level as a
sociated label . # 1 7298;:=<<?>
1 , a SVM classifier finds the process of reducing the measure of “impurity”. For a classi-
fication problem, this impurity is measured by the deviance The classification approach, and the content of the
(or the sum of squares) of a particular node : knowledge database, change according to the classification
method adopted:
B
J@ 7/.< :
Vector Space model: the knowledge database con-
71
7
(3) tains, for each request of the training set, the frequency
of each word, and the class the request belongs to; the
incoming request is then compared with each one of
where , known as the Gini index, is the deviance for
the J: node, & is the number of classes in the node and
the training set, and it is then associated to the class of
7 is the proportion of observation that are in the class . H the training set request having the smaller Euclidean
distance from it;
Classified
Training set AUTOMATIC
KNOWLEDGE Maintenance
CLASSIFIER
DATABASE Process
Requests
Request
Acceptance
Maintenance
Process
Request
Acceptance
Negative feedback
Positive feedback
and verb conjugations. Finally, the encoder keeps a dictio- The site where the problem occurred (we analyzed
nary of all words occurred in the arrived requests, providing data of requests coming from two different telephone
to add a new word whenever not already present. Each word plants);
is then associated to a numeric code, used to build the train-
ing set matrix. The date and time when the user generated the request;
Different tools were used to implement the proposed
A short description (maximum 12 words) of the prob-
classifiers:
lem occurred.
Vector Space model and Probabilistic model: Perl
scripts implement the classification methods, as above When a request arrives from the customer to the soft-
explained; in particular, the probabilistic classification ware maintainer, it is classified, according to its description
method made use of the beta-shifting approach [15] to and to assessment team experience, to be dispatched to the
avoid the zero-frequency problem [19]; most suitable maintenance team. In particular, this orga-
nization distributes the work among eight different main-
SVM, CART and k-nearest neighbor classification: tenance teams, each one devoted to maintain a particular
the classification was performed using the freely avail- portion of the system, or to perform a specific task (e.g., to
able R statistical tool (http://www.r-project.org); in test some specific functionalities):
particular, the packages e1071, tree and class were re-
spectively used for SVM, CART and k-nearest neigh- Database interface and administration;
bor classification.
CICS;
4. Case Study Local subsystems;
User interfaces;
As case study, maintenance requests (i.e., tickets) for a
large, distributed telecommunication system were analyzed.
Internal modules;
Each request contains, among other fields, the following in-
formation: Communication protocols (optimization);
The ID of the requests; Communication protocols (problem solving);
Generic problem solving. the method in row performs better than method in column
.
For instance, a request having the description “Data Successively, different kind of experiments were per-
modified by another terminal while booking” belongs to formed:
category “User interfaces”, while “Data not present in per-
mutator archive” belongs to category “Local subsystems”. 1. In the first case, the training set and the test set were
A total of 6000 requests, arrived from the beginning of randomly extracted from the entire request set; the ex-
1998 to the June of 2000, were analyzed. periment was repeated a high number of times, to en-
sure a convergence (i.e., a standard deviation of results
5. Experimental Results for different iterations below 5%), and the average re-
sults are reported in Figure 2 where, for different clas-
sification methods, the percentage of successfully clas-
Maintenance request tickets from the case study de-
sified request is shown. The size of the training set var-
scribed in Section 4 were classified following the process
ied from 1000 tickets (with a test set of 5000 tickets)
explained in Section 3. We compared all methods described
to 5000 tickets (with a test set of 1000 tickets).
in Section 2, also analyzing performances obtained with the
availability of maintainer’s feedback. 2. In the second case, training set and test set were built
considering the requests in the same order they arrived
5.1. Comparison of the Classification Methods to the maintainer. The size of the training sets consid-
ered was the same as for the randomly generated sets.
In order to compare performances of the different meth- This experiment aimed to simulate how the arriving re-
ods, a 10-fold cross-validation [17] for the five models was quests were, firstly, classified according to the previous
performed on the entire pool of tickets (6000). Percentages arrived ones, and then, on their own, put in the knowl-
of correct classification (Table 1) and paired t-test results edge database (Figure 1) to better classify successive
(Table 2) show how Probabilistic model, k-nearest neighbor arrivals. Results are shown in Figure 3. Similar re-
classification (for which we experienced that the best choice
H H @ <
sults were obtained increasing the training set every
for is ) and SVM perform significantly better than six months.
other methods.
Further details should be given for SVM, k-nearest
Method % of correct neighbor and CART classification. One may decide to use
classifications
all available information (i.e., all words of each request) to
Vector Space model 70%
perform a classification. However, we experienced that, us-
Probabilistic model 80%
k-nearest neighbor classification 82%
ing few words, the results substantially improved. In partic-
CART 71% ular, two words for SVM and k-nearest neighbor, and three
SVM 84% for CART, ensured the best performances. Analysis per-
formed on the tickets revealed that, usually, first words of
each ticket description sufficed to perform a ticket classi-
Table 1. Percentage of correct classifications fication (the remaining part of the ticket is a detailed de-
(10-fold cross validation) scription of the problem occurred). Similar analysis per-
formed for other methods (Probabilistic model and Vector
Space model) revealed that, reducing the set of words used
Vector Prob. k-nearest CART SVM
for classification, performances decreased.
Spaces model neighbor Let us now analyze the results obtained. Figures 2 and 3
Vector Spaces X 0 0 0 0 show how the Probabilistic model, SVM and k-nearest
Prob. model 1 X 0.47 1 0.26
k-nearest neighbor 1 0.53 X 1 0.18 neighbor perform substantially better than other methods,
CART 1 0 0 X 0 although SVM and k-nearest neighbor require a slightly
SVM 1 0.74 0.82 1 X
larger training set (i.e., its plots start lower) to reach high
D@
performances.
Table 2. Paired t-test p-values ( ) - Hyp: It is worth noting how, sometimes, Vector Space model
one method performs better than others and CART tend to get worse as training set grows. This
non-intuitive behavior is due to the fact that, since for Vec-
tor Space model the classification is performed comparing a
In particular, Table 2 shows, in each position , p- request with all the requests in the training set, and search-
values for a paired t-test, where the hypothesis tested is that ing for the one having the smallest distance, then it may
100
Vector Space model
Probabilistic model
k-nearest neighbor
SVM
90 CART
70
60
50
40
1000 2000 3000 4000 5000
Training set
happen that a training set request, very similar to the one ilar to one in the training set belonging to a different class,
to be classified, belongs to a different class. This may lead and it is therefore associated to it. Excluding that request
to a classification failure. It is evident how the probability from the training set usually grants a correct classification.
that this phenomenon occurs increases with the training set. For Probabilistic model, classification is performed using as
Instead, the Probabilistic model aims to build a word distri- queries word distribution of entire class of requests, there-
bution for each class of requests, so such a kind of request fore if a request badly fits a distribution (of fits a wrong
does not substantially affect the results. one), after removing that class the request may be again
wrongly associated to another distribution.
5.2. Classification with Feedback
5.3. Incremental Training Set
This step aims to analyze how, when a classification fails,
a maintainer’s negative feedback can be used to perform Finally, we tried to simulate the process exactly as de-
a new, possibly better, classification. Results, for Proba- scribed in Figure 1: each request was classified according
bilistic model and Vector Space model, are shown in Fig- to the previous arrived ones, i.e., considering all them as
ure 4. As in Figure 3, the training set and the test set were training set. Results, computed applying the Probabilistic
extracted following a time sequence. For the Probabilis- model, for chronologically ordered requests, are shown in
tic model the performance increase is, on average, of about Figure 5.
7%, and the 30% of the misclassified requests are success-
fully classified using the added negative information. For
the Vector Space model the performance increase is of about 6. Lesson Learned
25% and, in the 72% of the cases, fed-back requests are cor-
rectly classified. 10-fold cross-validation reported 86% of Results reported in Section 5 showed how the proposed
D @
correct classifications for Probabilistic model and 93% for approach allows to effectively classify maintenance request
Vector Space model. Also the paired t-test (with ) tickets upon their description.
confirms the significance of the results ( on : @ Thus, such a tool, properly integrated in a workflow
both cases for the hypothesis that distributions are equal). management system, can substantially improve the perfor-
Finally, we experienced that the behavior for SVM, CART mances of the maintenance process. Moreover, it may be
and k-nearest neighbor is similar to that of Probabilistic considered as a part of a queueing network composed by
model. multiple, parallel queues, one for each type of request (so-
Analysis performed revealed that the phenomenon is es- phisticating therefore the model proposed in [2], where a
sentially due to the fact that, often, a misclassification can single-queue model was adopted to dispatch requests to
be due to the fact that an incoming ticket could be very sim- maintenance teams independently from the type of inter-
100
Vector Space model
Probabilistic model
90 k-nearest neighbor
SVM
CART
80
60
50
40
30
20
10
1000 2000 3000 4000 5000
Training set
vention and the expertise of each team). The automatic dis- tigations, particularly devoted to apply reinforcement learn-
patcher may also be integrated in a bug-reporting tool like ing, should be performed.
Bug-Buddy or in an environment such that described in [7], Finally, Figure 5 highlights how the system “learns” as
where maintenance requests for a large-use commercial or the knowledge database grows. It is worth noting how, after
open-source product are posted via Internet. an initial transient exhibiting oscillations and a slow raise
A comparison of the different classification methods time, performances tends to reach a steady-state value.
highlights that a Probabilistic model performs, on average,
better than the other methods and, moreover, ensures that 7. Related Work
growing the training set almost always increase the classifi-
cation precision (this is not guaranteed by the other method, Several similarities exist between this paper and the
especially the Vector Space model, as explained in Sec- work of Mockus and Votta [14]. In [14] maintenance re-
tion 5). Good results (as highlighted performing 10-fold quests were classified according to maintenance categories
cross-validation, see Table 1) were also obtained adopting as IEEE Std. 1219 [1], then rated on a fault-severity scale
SVM or k-nearest neighbor. In the opinion of the authors, a using a frequency-based method. On the other hand, we
Probabilistic model classification may be further improved, adopted a different classification schema, where severity
considering the words appearing inside each request not in-
K
rate is actually imposed by customers, while the matching
dependent each one from another, i.e., modeling 3, . of the ticket with the maintenance team is under the respon-
Classification using SVM, k-nearest neighbor and CART sibility of the maintenance company.
confirmed that, as also stated by Votta and Mockus in [14], Burch and Kung [5] investigated into the changes of
a request can be successfully classified using few, discrim- maintenance requests, during the lifetime of a large soft-
inating words. However, our approach differs from that ware application, by modeling the requests themselves.
of [14], in that we could not apply a keyword-based cri- The necessity of an automatic dispatcher for mainte-
terion: even if we experienced that often the first words of nance request was highlighted in [7], where web-based
a ticket contain most of the useful information needed for maintenance centers were modeled by queueing theory, and
classification, these words are, in our case study, (and could in the more general queueing process described in [2]. In
be in general) uniformly distributed respect to the whole particular, in [7] was demonstrated that, for moderate lev-
dictionary of terms. els of misclassification, the overall system performance is
Figure 4 shows that feedback improves performances somehow degraded and, however, the majority of customers
differently for different method. Since Probabilistic model still experiences a reduction of the average service time.
performs better in the first classification, while Vector Space In recent years there has been a great deal of interest in
model takes maximum advantage from the feedback, a hy- applying, besides traditional statistical approach, machine-
brid strategy ensures the best performances. Further inves- learning and neural network methods to a variety of prob-
120
Probabilistic model - No Feedback
Probabilistic model - Feedback
Vector Space model - No Feedback
Vector Space model - Feedback
100
60
40
20
lems related to classify and extract information from text. information retrieval (distance classification) and cluster-
Literature reports some relevant research in the area of auto- ing (k-nearest neighbor classification). Tickets were clas-
mated text categorization: in 1994 Lewis and Ringuette [12] sified into eight categories representative both of the differ-
presented empirical results about the performance of a ent skills required to maintain the system and the available
Bayesian classifier and a decision tree learning algorithm team composition; in other words, each category matches a
on two text categorization data sets, finding that both al- dedicated pool of maintenance teams. Empirical evidence
gorithms achieve reasonable performance. In 1996 Gold- showed that, especially using Probabilistic model, k-nearest
berg [9] empirically tested a new machine learning algo- neighbor and SVM, the automatic classification resulted as
rithm, CDM, designed specifically for text categorization, affective, thus helpful for the maintenance process.
and having better performances than Bayesian classifica- Future works will be devoted to apply more sophisticated
tion and decision tree learners. In 1998, Dumais et al. [8]
compared different automatic learning algorithms for text
K
Probabilistic models (i.e., , ), as well as reinforce-
ment learning strategies. Furthermore, classification will be
categorization in terms of learning speed and classification used for the entire maintenance process, also to determine
accuracy; in particular, they highlighted SVM advantages the specific category of intervention to be executed, accord-
in terms of accuracy and performances. Joachims [11] per- ing to the description of maintenance activities previously
formed a set of binary text classification experiments using performed on similar requests. Automatic assignment of
the SVM and comparing it with other classification tech- priorities will also be introduced, to allow immediate dis-
niques. patching of the most critical requests.
8. Conclusions
9. Acknowledgments
We proposed an approach, inspired by intelligent active
agents, for the automatic classification of incoming mainte-
nance tickets, in order to route them to the most appropriate
maintenance teams. Our final goal is to develop a router, This research is supported by the project ”Virtual Soft-
working around the clock, that, without human intervention, ware Factory”, funded by Ministero della Ricerca Scien-
routes incoming tickets with the lowest misclassification er- tifica e Tecnologica (MURST), and jointly carried out by
ror, ensuring to achieve the shortest possible mean time to EDS Italia Software, University of Sannio, University of
repair. Naples ”Federico II” and University of Bari.
We compared, in classifying real world maintenance Authors would also thank anonymous reviewers for giv-
tasks, different approaches derived from pattern recognition ing precious inputs to prepare the camera-ready copy of the
(classification trees, support vectors, Bayesian classifiers), paper.
100
No Feedback
Feedback
90
80
60
50
40
30
20
0 1000 2000 3000 4000 5000
Training set