Beruflich Dokumente
Kultur Dokumente
Abstract
We present a manhole profiling tool, developed as part
of the Columbia/Con Edison machine learning project on
manhole event prediction, and discuss its role in evaluating our machine learning model in three important ways:
elimination of outliers, elimination of falsely predictive features, and assessment of the quality of the model. The model
produces a ranked list of tens of thousands of manholes in
Manhattan, where the ranking criterion is vulnerability to
serious events such as fires, explosions and smoking manholes. Con Edison set two goals for the model, namely
accuracy and intuitiveness, and this tool made it possible
for us to address both of these goals. The tool automatically assembles a report card or profile highlighting
data associated with a given manhole. Prior to the processing work that underlies the profiling tool, case studies of a
single manhole took several days and resulted in an incomplete study; locating manholes such as those we present in
this work would have been extremely difficult. The model is
currently assisting Con Edison in determining repair priorities for the secondary electrical grid.
1. Introduction
In March of 2007, we began a collaboration with NYCs
power utility, Con Edison, with the goal of prioritizing manholes (or more generally, manholes and service boxes, or
structures) in the secondary network for future upgrade
and repair. Con Edison started a comprehensive inspection program in 2004; before this point, repairs were made
mainly in response to an event (for instance, a power outage). The inspection program has generated a large number of possible future repairs that Con Edison would like
to make pre-emptively, however, this list is long (and growing), with the consequence that repairs need to be prioritized
ticket
house no
cp
street
arty
trouble type substring
recvd datetime
cross st
ME031019735
W
24
ST
ACB
2003-12-27
7 AV
remarks:
FRANK CURTIS REPORTS IN M544466 S/W/C W.24ST &
7TH AVE CREW
FOUND AC B/OS.............................FH
12/27/03 00:33 MDEHINOJOS DISPATCHED BY 11511
12/27/03 01:15 MDEHINOJOS ARRIVED BY 55988
12/27/03 02:50 HINOJOSA/#9 REPORT ALL BURNOUTS
CLEARED .
12/27/03 02:53 MDEHINOJOS COMPLETE BY 55988
ria in the late 1980s.1 Each ECS ticket includes a collection of notes about an event (called the remarks), partially recorded in free-text by Con Edison dispatchers and
partially automated log entries. The free-text may include
the name of the customer reporting the call, the type of
equipment missing an electrical phase, whether Con Edison
crews were able to park their vehicle in the area, whether
someone reported smoke, etc. The ticket system is mainly
used for recording logistics and record keeping, so a ticket
may not contain a description of the event it represents.
Tickets can exceed several hundred lines and may contain
misspellings, irregular shorthand notations, and formatting
problems. For instance, there are over 40 different ways that
service box is written, e.g., S/B, BOX, SERV BX.
Part of an example ticket is provided in Figure 1.2 The ticket
contains a collection of key-value pairs, containing important information such as the location of the event as a (noisy
and imprecise) street address, date of event, and trouble type
as a 2-3 letter code (such as ACB for AC burnout, SMH
for smoking manhole, or MHF for a manhole fire).
We were also given cable data and inspections data.
The inspections data required only some interpretation and
cleaning. The cable data, on the other hand, were very
noisy. We went through several iterations of requesting cable data from different departments within Con Edison; important fields were often missing, and cables often could not
be matched to the corresponding structures.
1 Several of the Con Edison engineers still refer to ECS tickets as BTickets in reference to the time when tickets were written directly onto
carbon copy and the B copy was the one that was filed.
2 Ticket identifiers and structure identifiers have been made anonymous,
and street addresses have been altered.
Structure Evaluation
The Structure Profiling Tool produces a report for a single structure based on the stages of the modeling process,
containing: a summary of all ECS tickets within 60 meters
of the structure and all additional tickets that the structure
was mentioned in, a roster of all cables entering the structure, a list of the structures past inspection results, and the
cover type. This profile allows a domain expert to approximately judge the relative vulnerability of a structure at a
4.1
Elimination of Outliers
Type
FLT
LV
ACB
SO
ACB
ACB
SO
ACB
UDC
Date
2005-08-28
2005-07-22
2005-07-20
2004-04-29
2004-01-29
2003-12-27
2003-08-05
2000-06-20
2000-01-26
TH
*
*
*
*
*
Line
37
32
105
31
4
5
29
55
30
Sh
*
*
*
*
*
*
*
was easier to satisfy the criteria for a positive label; statistically, the model yielded very good results. However, the
most highly ranked structures were those near hotspots,
or areas with many nearby events. Some of the highly prioritized structures would not be considered vulnerable (and
would not be ranked highly in our current model). For instance, there is another structure at the same intersection as
MH 544466 that has only 20 cables and has never been involved in an event. The initial model would have ranked
both structures highly. The current model ranks the first
structure highly, and the second structure at 34K/51K, or
about three-quarters of the way down the list.
4.2
4.3
An important goal for us was to be able to pinpoint vulnerable structures that had not already been repaired. Con
Type
SMH
ACB
ACB
SMH
Date
2002-06-30
1999-05-07
1999-04-15
1999-02-22
TH Line
*
13
*
4
*
7
*
13
Sh
*
*
T1
3
T2
0
Cable information:
From
To
SB137521
MH37518
SB137521
MH37518
SB137521
MH37518
SB137521
MH37518
SB37522
SB137521
SB37522
SB137521
SB37522
SB137521
SB37522
SB137521
SB137521
SB37523
SB137521
SB37523
SB137521
SB37523
SB137521
SB37523
SB137521
SB137521
SB137521
#Cable
1
3
1
6
6
1
3
1
3
2
6
2
7
4
2
Installed
1929
1929
1934
1934
1934
1929
1929
1934
1999
1949
1949
1999
1929
1919
1919
Mtrl
BB
RL
BB
RL
RL
BB
RL
BB
RN
BB
RN
BB
RL
RL
RL
M/S
Main
Main
Main
Main
Main
Main
Main
Main
Main
Main
Main
Main
Service
StLight
StLight
ture, SB 3412, that was the trouble hole for a serious manhole fire in 2006. This manhole fire would have been difficult to predict, since the structure had no prior involvement
in events whatsoever, no pending repairs, only 23 cables,
none of which are particularly old, and no currently present
aluminum cables. In fact there was only one (not-serious)
ticket within 60 meters of the structure within the 12 year
span of our data. Reading the ticket for the manhole fire, it is
discussed that the cables replaced were not aluminum, and
the suspected cause of the incident was insulation breakdown; it is exactly the type of event that our model was
designed to predict, but without any obvious way to predict
it. There are several structures each year that are trouble
holes for serious events, but without any clear predictive
factors. In 2007 for instance, 20% of the serious events
(9/44) possessed fewer than 20 cables.
Final Remarks
Acknowledgment
This work was supported by a grant from Con Edison. Thanks
to our collaborator Steve Ierome. Also thanks to Dave Waltz,
Roger Anderson, Serena Lee, Robert Schimmenti, Artie Kressner, Maggie Chow, Ashish Tomar, Nandini Bhardwaj, Zhi An Liu,
Charles Collins, and Diego Fulgueira.
References
[1] G. Adomavicius and A. Tuzhilin. Using data mining methods
to build customer profiles. Computer, 34(2):7482, 2001.
[2] A. Ceglowski., L. Churilov, and J. Wassertheil. Knowledge
discovery through mining emergency department data. In Proceedings of the 38th Annual Hawaii International Conference
on System Sciences (HICSS 05), pages 142c142c, Jan. 2005.
[3] H. Dola and B. Chowdhury. Data mining for distribution system fault classification. In Proc. 37th Annual North American
Power Symposium, pages 457462, Oct. 2005.
[4] P. Gross, A. Boulanger, M. Arias, D. L. Waltz, P. M. Long,
C. Lawson, R. Anderson, M. Koenig, M. Mastrocinque,
W. Fairechio, J. A. Johnson, S. Lee, F. Doherty, and A. Kressner. Predicting electricity distribution feeder failures using
machine learning susceptibility analysis. In The Eighteenth
Conference on Innovative Applications of Artificial Intelligence IAAI-06, Boston, Massachusetts, 2006.
[5] G. Lambert-Torres, D. da Silva Filho, and C. de Moraes. Intelligent data mining in power distribution company for commercial load forecasting. In IEEE Power Engineering Society
General Meeting, pages 16, June 2007.
[6] R. Passonneau, C. Rudin, A. Radeva, and Z. A. Liu. Reducing
noise in labels and features for a real world dataset: Application of NLP corpus annotation methods. In Proceedings of the
10th International Conference on Computational Linguistics
and Intelligent Text Processing (CICLing), 2009.
[7] C. Rudin. The P-Norm Push: A simple convex ranking algorithm that concentrates at the top of the list. Journal of
Machine Learning Research, 2009. In press.
[8] C. Rudin, R. Passonneau, A. Radeva, H. Dutta, S. Ierome,
and D. Isaac. Predicting manhole events in Manhattan : A
case study in extended knowledge discovery. Accepted for
publication to Machine Learning, 2009.