Beruflich Dokumente
Kultur Dokumente
net/publication/221325386
CITATIONS READS
11 104
6 authors, including:
Zhi-Li Zhang
University of Minnesota Twin Cities
309 PUBLICATIONS 8,091 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Patrick Haffner on 03 June 2014.
Yu Jin∗ , Nick Duffield† , Alexandre Gerber† , Patrick Haffner† , Subhabrata Sen† , Zhi-Li Zhang∗
∗ Computer Science Dept., University of Minnesota † AT&T Labs–Research
∗ {yjin,zhzhang}@cs.umn.edu † {duffield,gerber,haffner,sen}@research.att.com
2000
Number of features
Number of features
Number of features
15 15
1500
10 10
1000
5 5
500
0 0 0
0.05 0.1 0.15 0.2 0.25 0.05 0.1 0.15 0.2 0.25 0 0.1 0.2 0.3 0.4
Top 20K average precision Top 20K average precision Top 20K average precision
(a) History and customer features (b) Quadratic features (c) Product features
Figure 4: Top 20K average precision for different types of derived features
PT
to these problems are mislabeled as negative examples other summed up as the prediction f (x) := t=1 gt (x). The
than positive examples. Because of the existence of such score f is then converted to the posterior probability P (T kt(u)|x)
noise in the training data, sophisticated non-linear models using logistic calibration.
overfit easily, we hence choose a linear model for f which
is more robust against noise. The Adaboost algorithm with 5. EXPERIMENTAL RESULTS
decision stumps (i.e. one-level decision trees) [16], which
We use line measurement records from 08/01/09 to 09/31/09
we refer to as BStump in this paper, is selected to construct
as our training data, and the data in the four contiguous
the model. The linear model from BStump was shown to be
weeks starting from 10/31/09 as our test data. The line mea-
the most scalable while having an accuracy comparable to
surements from 01/01/09 to 07/31/09 are history records for
sophisticated non-linear classifiers [7].
computing time-series features and customer related features.
We use Boostexter [16] as the BStump implementation. The
number of iterations is set to 800 based on cross-validation4.
Accuracy (%)
25 0.6
25
CDF
20
20 0.4
15
Figure 6: Feature selection methods. Figure 7: Ticket prediction results. Figure 8: CDF of ticket coming time.
We illustrate the accuracy of different feature selection
methods in Fig. 6, where the x-axis represents the number of From the operation perspective, the effectiveness of a proac-
top predictions selected and the y-axis stands for the accu- tive solution is affected by two factors: 1) the number of
racy of these top x predictions. We observe that the proposed actual customer edge problems within the 20K predictions;
top-N AP method outperforms all the other methods signif- 2) the time allowed for resolving predicted problems before
icantly when less than 20K predictions are chosen. How- customers complain.
ever, as more predictions are selected, the accuracy from From the evaluation result in Section 5.1, we achieve an
the top-N AP method becomes lower than the popular AUC accuracy around 40% for the top 20K predictions, i.e., around
based method. This phenomenon implies that, even though 12K incorrect predictions. Note that in such an offline eval-
the boosting algorithm focuses on maximizing the overall uation, we rely on actual customer tickets to evaluate the
accuracy, the features chosen by the top-N AP method ef- accuracy of the proposed ticket predictor, i.e., a ticket is con-
fectively select more true predictions to be within the top sidered to be correctly predicted if and only if a ticket asso-
20K. In this way, we can make better use of the extra ticket ciated with a customer edge problem is observed within one
handling capacity provided by the ATDS system to proac- month after the prediction. We use one month here to give
tively resolve more potential customer edge problems. We the customers enough time to report the problems dealing
note that in our top-N AP method, N is a tunable param- with intermittent connections or slow browsing problems as
eter, which can be enlarged when more predictions can be they are hard to perceive or more tolerable. Still, the reported
accommodated by ATDS. problems from the customer tickets only comprise a subset
We next study the performance of the ticket predictor on of all the customer edge problems. In the following, we ex-
different line measurement features in Table 3. Fig. 7 shows plore two scenarios when a real problem will not be reported
the prediction results under different feature sets, where the by the customer.
x-axis stands for the number of predictions selected and the Outage problems and IVR. The first scenarios is related to
y-axis represents the accuracy or precision in the top x pre- the outage problems. When an outage problem is detected at
dictions. The dotted curve corresponds to the accuracy when a certain area (e.g., from customer tickets or system alarms),
only history and customer features are used, while the solid an interactive voice response (IVR) system will be set up to
curve represents the accuracy when all the features in Ta- automatically inform customers who call from the same area
ble 3 are used. The top-20K AP method is applied to select about the outage problem. In this case, the customer does re-
features from both feature sets. port a problem (caused by the outage) but no ticket is issued
From Fig. 7, we achieve good accuracy (37.8%) with his- since the phone call is handled by IVR instead of customer
tory and customer features for the top 20K predictions. The agents. We explore this scenario as follows. For each incor-
addition of derived features further boosts the accuracy to rect prediction, we first identify the corresponding DSLAM
40%. More specifically, when N reaches the maximum ca- that it is associated with. We assume the incorrect prediction
pacity of 20K, we achieve 2 true predictions for 3 incorrect belongs to the IVR scenario if at least one outage problem is
predictions (or a ratio of 1:1.5)5. This means that for more observed in the tickets from the customers belonging to the
than 8K of the predicted lines, we observe at least one ticket same DSLAM within T weeks from the prediction time.
within four weeks after our prediction. This demonstrates
1 week 2 weeks 3 weeks 4 weeks
the effectiveness of incorporating the covariance of features
% of incorrect predictions 12.7 18.4 26.4 31.5
into the classifier by encoding them explicitly using derived Coef. for outage prediction 0.0733 0.0752 0.0787 0.0784
quadratic features and product features. P-value 0.0021 0.0013 0.0027 0.0011