Sie sind auf Seite 1von 15

Process Mining

Manifesto

Process mining is a relatively young model extension, model repair, case


A manifesto is a "public
research discipline that sits between prediction, and history-based
declaration of principles computational intelligence and data recommendations.
and intentions" by a group mining on the one hand, and process
modeling and analysis on the other
of people. This manifesto is hand. The idea of process mining is to
written by members and discover, monitor and improve real
Contents:
processes (i.e., not assumed processes)
supporters of the IEEE Task by extracting knowledge from event
Force on Process Mining. logs readily available in today's
Process Mining – State of the Art 3
The goal of this task force (information) systems. Process mining Guiding Principles 6
includes (automated) process discovery
is to promote the research, (i.e., extracting process models from an Challenges 10
development, education, event log), conformance checking (i.e.,
monitoring deviations by comparing Epilogue 13
implementation, evolution, model and log), social network/
and understanding of organizational mining, automated
Glossary 14
process mining. construction of simulation models,

Process mining techniques are able to extract knowledge from event logs commonly available in today's information
systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application
domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events
are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need
to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by
the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of
guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers,
scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new
tool to improve the (re)design, control, and support of operational business processes.

1
Starting point is an event
log. Each event refers to a have become readily available and
process mining techniques have
process instance (case)
The event log can be
and an activity. Events are
used to discover roles in
ordered and additional
the organization (e.g.,
groups of people with
properties (e.g. timestamp
or resource data) may be
matured. Moreover, as just mentioned,
management trends related to process
similar work patterns).
present.
These roles can be used
to relate individuals and
activities. improvement (e.g., Six Sigma, TQM,
CPI, and CPM) and compliance (SOX,
BAM, etc.) can benefit from process
Role A: Role E: Role M: mining. Fortunately, process mining
Assistant Expert Manager
algorithms have been implemented in
Pete Sue Sara
various academic and commercial
Mike Sean
systems. Today, there is an active
Ellen
E
group of researchers working on
examine process mining and it has become one
thoroughly

A
of the "hot topics" in Business Process
Management (BPM) research.
A
pay
examine
compensation
Moreover, there is a huge interest from
casually M
A
register
decide
request
industry in process mining. More and
A
A
start reject end
check ticket
more software vendors are adding
M request
reinitiate

process mining functionality to their


request

Discovery techniques can be used to find a control-


flow model (in this case in terms of a BPMN model)
that describes the observed behavior best.
tools. Examples of software products
Decision rules (e.g., a decision
tree based on data known at
with process mining capabilities are:
Performance information (e.g., the average time
between two subsequent activities) can be extracted
the time a particular choice
was made) can be learned
ARIS Process Performance Manager
(Software AG), Comprehend (Open
from the event log and visualized on top of the model. from the event log and used to
annotate decisions.

Connect), Discovery Analyst


Figure 1: Process mining techniques extract knowledge from (StereoLOGIC), Flow (Fourspark),
Futura Reflect (Futura Process
event logs in order to discover, monitor and improve processes.
Intelligence), Interstage Automated
Process Discovery (Fujitsu), OKT
Process Mining suite (Exeura), Process
Process mining provides an to technologies to process large
Discovery Focus (Iontas/Verint),
important bridge between data mining amounts of events, utilizing them to
ProcessAnalyzer (QPR), ProM (TU/e),
and business process modeling and monitor, steer and optimize the
Rbminer/Dbminer (UPC), and Reflect|
analysis. Under the Business business in real time. Corporate
one (Pallas Athena). The growing
Intelligence (BI) umbrella many Performance Management (CPM) is
interest in log-based process analysis
buzzwords have been introduced to another buzzword for measuring the
motivated the establishment of a Task
refer to rather simple reporting and performance of a process or
Force on Process Mining.
dashboard tools. Business Activity organization. Also related are
The task force was established in
Monitoring (BAM) refers to management approaches such as 2009 in the context of the Data Mining
technologies enabling the real-time Continuous Process Improvement (CPI), Technical Committee (DMTC) of the
monitoring of business processes. Business Process Improvement (BPI),
Computational Intelligence Society
Complex Event Processing (CEP) refers Total Quality Management (TQM), and
(CIS) of the Institute of Electrical and
Six Sigma. These approaches have in
Electronic Engineers (IEEE). The current
common that processes are "put under
task force has members representing
a microscope" to see whether further
software vendors (e.g., Pallas Athena,
Concrete objectives of improvements are possible. Process
Software AG, Futura Process
the task force: mining is an enabling technology for
Intelligence, HP, IBM, Infosys, Fluxicon,
CPM, BPI, TQM, Six Sigma, and the Businesscape, Iontas/Verint, Fujitsu,
1) Make end-users, developers, like. Fujitsu Laboratories, Business Process
consultants, business Whereas BI tools and management Mining, Stereologic), consultancy firms/
managers, and researchers approaches such as Six Sigma and
aware of the state-of-the-art in end users (e.g., ProcessGold, Business
TQM aim to improve operational
process mining. Process Trends, Gartner, Deloitte,
performance, e.g., reducing flow time
Process Sphere, Siav SpA, BPM Chile,
2) Promote the use of process and defects, organizations are also
BWI Systeme GmbH, Excellentia BPM,
mining techniques and tools putting more emphasis on corporate
Rabobank), and research institutes
and stimulate new governance, risks, and compliance.
(e.g., TU/e, University of Padua,
applications. Legislations such as the Sarbanes-Oxley Universitat Politècnica de Catalunya,
3) Play a role in standardization Act (SOX) and the Basel II Accord New Mexico State University, IST -
efforts for logging event data. illustrate the focus on compliance Technical University of Lisbon,
issues. Process mining techniques offer
4) Organize tutorials, special University of Calabria, Penn State
a means to more rigorously check
sessions, workshops, panels. University, University of Bari, Humboldt-
compliance and ascertain the validity
5) Publish articles, books, videos, Universität zu Berlin, Queensland
and reliability of information about an
and special issues of journals. University of Technology, Vienna
organization's core processes.
University of Economics and Business,
Over the last decade, event data

2
van der Aalst
Stevens Institute of Technology,
University of Haifa, University of
Bologna, Ulsan National Institute of
Process Mining Book
Science and Technology, Cranfield Wil M. P. van de
r Aalst
University, K.U. Leuven, Tsinghua
University, University of Innsbruck,
University of Tartu).
www.processmining.org/book/
W.M.P. van der Aalst. Process
1
Process

Process Mining
Since its establishment in 2009
there have been various activities
related to the above objectives. For
Mining: Discovery,
Conformance and
Enhancement of Business
Mining
example, several workshops and Processes. Springer-Verlag, Discover y, Conf
Berlin, 2011. Enhancement ormance and
special tracks were (co-)organized by of Business Proc
esses
the task force, e.g., the workshops on
Business Process Intelligence (BPI'09,
BPI'10, and BPI'11) and special tracks
at main IEEE conferences
(e.g. CIDM'11). Knowledge was
disseminated via tutorials
The reader is invited to visit pace. These advancements resulted in a
(e.g. WCCI'10 and PMPM'09), summer
www.win.tue.nl/ieeetfpm/ for more spectacular growth of the "digital
schools (ESSCaSS'09, ACPN'10, information about the activities of the universe" (i.e., all data stored and/or
CICH'10, etc.), videos task force. exchanged electronically). Moreover,
(cf. www.processmining.org), and the digital and the real universe
several publications including the first continue to become more and more
book on process mining recently 2. Process Mining: aligned.
published by Springer. The task force The growth of a digital universe
also (co-)organized the first Business State of the Art that is well-aligned with processes in
Process Intelligence Challenge
organizations makes it possible to
(BPIC'11): a competition where The expanding capabilities of
record and analyze events. Events may
participants had to extract meaningful information systems and other systems
range from the withdrawal of cash from
knowledge from a large and complex that depend on computing, are well
an ATM, a doctor adjusting an X-ray
event log. In 2010, the task force also characterized by Moore's law. Gordon
machine, a citizen applying for a driver
standardized XES (www.xes- Moore, the co-founder of Intel,
license, the submission of a tax
standard.org), a standard logging predicted in 1965 that the number of
declaration, and the receipt of an e-
format that is extensible and supported components in integrated circuits would
ticket number by a traveler. The
by the OpenXES library double every year. During the last fifty
challenge is to exploit event data in a
(www.openxes.org) and by tools such years the growth has indeed been
meaningful way, for example, to
as ProM, XESame, Nitro, etc. exponential, albeit at a slightly slower
provide insights, identify bottlenecks,
anticipate problems, record policy
violations, recommend
supports/ countermeasures, and streamline
“world” business
controls processes. Process mining aims to do

processes software exactly that.


Starting point for process mining is
people machines system an event log. All process mining
components techniques assume that it is possible to
organizations records sequentially record events such that
events, e.g., each event refers to an activity (i.e., a
messages, well-defined step in some process) and
specifies transactions,
models is related to a particular case (i.e., a
configures etc.
analyzes process instance). Event logs may store
implements
additional information about events. In
analyzes
fact, whenever possible, process mining
techniques use extra information such
discovery as the resource (i.e., person or device)
executing or initiating the activity, the
(process) event timestamp of the event, or data
conformance
model logs elements recorded with the event (e.g.,
the size of an order).
enhancement As shown in Fig. 2, event logs can
be used to conduct three types of
Figure 2: Positioning of the three main types of process mining: (a) process mining. The first type of process
discovery, (b) conformance checking, and (c) enhancement. mining is discovery. A discovery
technique takes an event log and

3
(a) event log discovery model Process Mining
Characteristics:

1. Process mining is not limited


event log
conformance to control-flow discovery.
(b) diagnostics
checking The discovery of process
model models from event logs fuels the
imagination of both
practitioners and academics.
event log Therefore, control-flow
(c) discovery is often seen as the
enhancement new model
most exciting part of process
model mining. However, process
mining is not limited to control-
Figure 3: The three basic types of process mining explained in terms flow discovery. On the one
of input and output: (a) discovery, (b) conformance checking, and hand, discovery is just one of
(c) enhancement. the three basic forms of process
mining (discovery,
conformance, and
produces a model without using any a- model (e.g., a Petri net, BPMN, EPC, or
priori information. Process discovery is UML activity diagram), however, the enhancement). On the other
the most prominent process mining model may also describe other hand, the scope is not limited to
technique. For many organizations it is perspectives (e.g., a social network). control-flow; the organizational,
surprising to see that existing Conformance checking techniques need case and time perspectives also
techniques are indeed able to discover an event log and a model as input. The play an important role.
real processes merely based on output consists of diagnostic information
example executions in event logs. The showing differences and commonalities 2. Process mining is not just a
second type of process mining is between model and log. Techniques for
specific type of data mining.
conformance. Here, an existing process model enhancement (repair or
Process mining can be seen as
model is compared with an event log of extension) also need an event log and
the "missing link" between data
the same process. Conformance a model as input. The output is an
checking can be used to check if improved or extended model. mining and traditional model-
reality, as recorded in the log, Process mining may cover different driven BPM. Most data mining
conforms to the model and vice versa. perspectives. The control-flow techniques are not process-
Note that different types of models can perspective focuses on the control-flow, centric at all. Process models
be considered: conformance checking i.e., the ordering of activities. The goal potentially exhibiting
can be applied to procedural models, of mining this perspective is to find a concurrency are incomparable
organizational models, declarative good characterization of all possible to simple data mining structures
process models, business rules/policies, paths. The result is typically expressed such as decision trees and
laws, etc. The third type of process in terms of a Petri net or some other association rules. Therefore,
mining is enhancement. Here, the idea process notation (e.g., EPCs, BPMN, or
completely new types of
is to extend or improve an existing UML activity diagrams). The
representations and algorithms
process model using information about organizational perspective focuses on
the actual process recorded in some information about resources hidden in are needed.
event log. Whereas conformance the log, i.e., which actors (e.g., people,
checking measures the alignment systems, roles, or departments) are 3. Process mining is not limited
between model and reality, this third involved and how are they related. The to offline analysis.
type of process mining aims at goal is to either structure the Process mining techniques
changing or extending the a-priori organization by classifying people in extract knowledge from
model. For instance, by using terms of roles and organizational units historical event data. Although
timestamps in the event log one can or to show the social network. The case "post mortem" data is used, the
extend the model to show bottlenecks, perspective focuses on properties of results can be applied to
service levels, throughput times, and cases. Obviously, a case can be
running cases. For example, the
frequencies. characterized by its path in the process
completion time of a partially
Figure 3 describes the three types or by the actors working on it.
of process mining in terms of input and However, cases can also be handled customer order can be
output. Techniques for discovery take characterized by the values of the predicted using a discovered
an event log and produce a model. The corresponding data elements. For process model.
discovered model is typically a process example, if a case represents a

4
replenishment order, it may be trigger a new process redesign phase.
interesting to know the supplier or the Process mining is a valuable tool for
number of products ordered. The time most of the phases shown in Fig. 4. Guiding Principles:
perspective is concerned with the timing Obviously, the diagnosis phase can
and frequency of events. When events benefit from process mining. However, GP1: Event Data Should be
bear timestamps it is possible to process mining is not limited to the Treated as First-Class Citizens
discover bottlenecks, measure service diagnosis phase. For example, in the
levels, monitor the utilization of execution phase, process mining GP2: Log Extraction Should Be
resources, and predict the remaining techniques can be used for operational Driven by Questions
processing time of running cases. support. Predictions and
There are some common recommendations based on models GP3: Concurrency, Choice and
misconceptions related to process learned using historic information can Other Basic Control-Flow
mining. Some vendors, analysts, and be used to influence running cases. Constructs Should be Supported
researchers limit the scope of process Similar forms of decision support can
mining to a special data mining be used to adjust processes and to GP4: Events Should Be Related to
technique for process discovery that guide process (re)configuration. Model Elements
can only be used for offline analysis. Whereas Fig. 4 shows the overall
This is not the case, therefore, we BPM life-cycle, Fig. 5 focuses on the GP5: Models Should Be Treated
emphasize the three characteristics in concrete process mining activities and as Purposeful Abstractions of
the box on the previous page. artifacts. Figure 5 describes the Reality
To position process mining, we use possible stages in a process mining
the Business Process Management project. Any process mining project GP6: Process Mining Should Be a
(BPM) life-cycle shown in Fig. 4. The starts with a planning and a justification Continuous Process
BPM life-cycle shows seven phases of a for this planning (Stage 0). After
business process and its corresponding initiating the project, event data,
information system(s). In the (re)design models, objectives, and questions need Moreover, the event log may be filtered
phase a new process model is created to be extracted from systems, domain or adapted using the model (e.g.,
or an existing process model is experts, and management (Stage 1). removing rare activities or outlier cases,
adapted. In the analysis phase a This requires an understanding of the and inserting missing events).
candidate model and its alternatives available data ("What data can be Sometimes significant efforts are
are analyzed. After the (re)design used for analysis?") and an needed to correlate events belonging
phase, the model is implemented understanding of the domain ("What to the same process instance. The
(implementation phase) or an existing are the important questions?") and remaining events are related to entities
system is (re)configured results in the artifacts shown in Fig. 5 of the process model. When the
(reconfiguration phase). In the (i.e., historical data, handmade process is relatively structured, the
execution phase the designed model is models, objectives, and questions). In control-flow model may be extended
enacted. During the execution phase Stage 2 the control-flow model is with other perspectives (e.g., data,
the process is monitored. Moreover, constructed and linked to the event log. time, and resources) during Stage 3.
smaller adjustments may be made Here automated process discovery The relation between the event log and
without redesigning the process techniques can be used. The discovered the model established in Stage 2 is
(adjustment phase). In the diagnosis process model may already provide used to extend the model (e.g.,
phase the enacted process is analyzed answers to some of the questions and timestamps of associated events are
and the output of this phase may trigger redesign or adjustment actions. used to estimate waiting times for
activities). This may be used to answer
diagnosis additional questions and may trigger
additional actions. Ultimately, the
models constructed in Stage 3 may be
adjustment analysis
used for operational support (Stage 4).
Knowledge extracted from historical
execution (re)design
event data is combined with
information about running cases. This
may be used to intervene, predict, and
recommend. Stages 3 and 4 can only
be reached if the process is sufficiently
(re)configuration implementation stable and structured.
Currently, there are techniques and
tools that can support all stages shown
in Fig. 5. However, process mining is a
Figure 4: The BPM life-cycle identifying the various phases of a relatively new paradigm and most of
business process and its corresponding information system(s); the currently available tools are still
rather immature. Moreover, prospective
process mining (potentially) plays a role in all phases (except for the
users are often not aware of the
implementation phase). potential and the limitations of process

5
Figure 5: The L* life-cycle
model describing a process
Stage 0: plan and justify
mining project consisting of
five stages: plan and justify
understanding of the
available data
understanding of the
domain
(Stage 0), extract (Stage 1),
create a control-flow model
Stage 1: extract and connect it to the event
log (Stage 2), create an
integrated process model
(Stage 3), and provide
operational support (Stage
historical handmade objectives questions 4).
data models (KPIs)

Stage 2: create control-flow model


and connect event log

event log control-flow model redesign

interpret
Stage 3: create integrated process
model adjust

current data event log process model intervene

Stage 4: operational support support

mining. Therefore, this manifesto treated as first-class citizens in the


GP1: Event Data Should Be
catalogs some guiding principles information systems supporting the
(cf. next section) and challenges Treated as First-Class processes to be analyzed.
(cf. page 10) for users of process Citizens Unfortunately, event logs are often
mining techniques as well as merely a "by-product" used for
researchers and developers that are Starting point for any process mining debugging or profiling. For example,
interested in advancing the state-of-the- activity are the events recorded. We the medical devices of Philips
art. refer to collections of events as event Healthcare record events simply
logs, however, this does not imply that because software developers have
events need to be stored in dedicated
inserted "print statements" in the code.
3. Guiding Principles log files. Events may be stored in
Although there are some informal
database tables, message logs, mail
As with any new technology, there guidelines for adding such statements
archives, transaction logs, and other
are obvious mistakes that can be made to the code, a more systematic
data sources. More important than the approach is needed to improve the
when applying process mining in real-
storage format, is the quality of such quality of event logs. Event data should
life settings. Therefore, we list six
event logs. The quality of a process be viewed as first-class citizens (rather
guiding principles to prevent users/
mining result heavily depends on the than second-class citizens).
analysts from making such mistakes.
input. Therefore, event logs should be

6
Level Characterization Examples
★★★★★ Highest level: the event log is of excellent quality (i.e., trustworthy Semantically annotated logs of
and complete) and events are well-defined. Events are recorded in BPM systems.
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.

★★★★ Events are recorded automatically and in a systematic and reliable Events logs of traditional BPM/
manner, i.e., logs are trustworthy and complete. Unlike the systems workflow systems.
operating at level ★★★ , notions such as process instance (case)
and activity are supported in an explicit manner.

★★★ Events are recorded automatically, but no systematic approach is Tables in ERP systems, event
followed to record events. However, unlike logs at level ★★, there logs of CRM systems,
is some level of guarantee that the events recorded match reality transaction logs of messaging
(i.e., the event log is trustworthy but not necessarily complete). systems, event logs of high-tech
Consider, for example, the events recorded by an ERP system. systems, etc.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).

★★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and
information system. Coverage varies, i.e., no systematic approach product management systems,
is followed to decide which events are recorded. Moreover, it is error logs of embedded
possible to bypass the information system. Hence, events may be systems, worksheets of service
missing or not recorded properly. engineers, etc.

★ Lowest level: event logs are of poor quality. Recorded events may Trails left in paper documents
not correspond to reality and events may be missing. Event logs for routed through the organization
which events are recorded by hand typically have such ("yellow notes"), paper-based
characteristics. medical records, etc.

Table 1: Maturity levels for event logs.

There are several criteria to judge Philips Healthcare reside at level


GP2: Log Extraction Should
the quality of event data. Events should ★★★, i.e., events are recorded
be trustworthy, i.e., it should be safe to automatically and the recorded Be Driven by Questions
assume that the recorded events behavior matches reality, but no As shown in Fig. 5, process mining
actually happened and that the systematic approach is used to assign activities need to be driven by
attributes of events are correct. Event semantics to events and to ensure questions. Without concrete questions it
logs should be complete, i.e., given a coverage at a particular level. Process is very difficult to extract meaningful
particular scope, no events may be mining techniques can be applied to event data. Consider, for example, the
missing. Any recorded event should logs at levels ★★★★★, ★★★★, thousands of tables in the database of
have well-defined semantics. Moreover, and ★★★ . In principle, it is also an ERP system like SAP. Without
the event data should be safe in the possible to apply process mining using concrete questions it is impossible to
sense that privacy and security event logs at level ★★ or ★. However, select the tables relevant for data
concerns are addressed when the analysis of such logs is typically extraction.
recording the events. For example, problematic and the results are not A process model such as the one
actors should be aware of the kind of trustworthy. In fact, it does not make shown in Fig. 1 describes the life-cycle
events being recorded and the way much sense to apply process mining to of cases (i.e., process instances) of a
they are used. logs at level ★. particular type. Hence, before applying
Table 1 defines five event log In order to benefit from process any process mining technique one
maturity levels ranging from excellent mining, organizations should aim at needs to choose the type of cases to be
quality (★★★★★) to poor quality event logs at the highest possible analyzed. This choice should be driven
( ★). For example, the event logs of quality level. by the questions that need to be

7
answered and this may be non-trivial.
Consider, for example, the handling of
customer orders. Each customer order B
may consist of multiple order lines as
the customer may order multiple A C E
products in one order. One customer
order may result in multiple deliveries. D
One delivery may refer to order lines
of multiple orders. Hence, there is a (a) B, C, and D can be executed in any order
many-to-many relationship between
orders and deliveries and a one-to- B
many relationship between orders and
order lines. Given a database with A C E
event data related to orders, order
lines, and deliveries, there are different D
process models that can be discovered.
(b) B, C, and D can be executed in any order but also multiple times
One can extract data with the goal to
describe the life-cycle of individual
C
orders. However, it is also possible to B D
extract data with the goal to discover D
the life-cycle of individual order lines or
the life-cycle of individual deliveries. B
A C C E
GP3: Concurrency, Choice D

and Other Basic Control- B


D B
Flow Constructs Should be C
Supported (c) B, C, and D can be executed in any order, but activities need to be
A plethora of process modeling duplicated to model all observed sequences.
languages exists (e.g., BPMN, EPCs,
Petri nets, BPEL, and UML activity Figure 6: Example illustrating problems when concurrency
diagrams). Some of these languages (i.e., AND-splits/joins) cannot be expressed directly. In the
provide many modeling elements (e.g.,
example just three activities (B, C, and D) are concurrent.
BPMN offers more than 50 distinct
graphical elements) whereas others are Imagine the resulting process models when there are 10
very basic (e.g., Petri nets are concurrent activities (210=1024 states and 10! = 3,628,800
composed of only three different possible execution sequences).
elements: places, transitions, and arcs).
The control-flow description is the
backbone of any process model. Basic
workflow constructs (also known as process using two AND gateways. workflow patterns. Besides the basic
patterns) supported by all mainstream Suppose that the process mining patterns mentioned it is also desirable
languages are sequence, parallel technique does not support AND to support OR-splits/joins, because
routing (AND-splits/joins), choice (XOR- gateways. In this case, the other two these provide a compact representation
splits/joins), and loops. Obviously, BPMN models in Fig. 6 are obvious of inclusive decisions and partial
these patterns should be supported by candidates. The BPMN model in synchronizations.
process mining techniques. However, Fig. 6(b) is compact but allows for too
some techniques are not able to deal much behavior (e.g., cases such as〈 A, GP4: Events Should Be
B, B, B, E 〉 are possible according to
with concurrency and support only Related to Model Elements
Markov chains/transition systems. the model but are not likely according
to the event log). The BPMN model in As indicated in Sect. 2, it is a
Figure 6 shows the effect of using
Fig. 6(c) allows for the cases in L, but misconception that process mining is
process mining techniques unable to
encodes all sequences explicitly, so it is limited to control-flow discovery. As
discover concurrency (no AND-split/
not a compact representation of the shown in Fig. 1, the discovered process
joins). Consider an event log L={〈A, B,
log. The example shows that for real- model may cover various perspectives
C, D, E〉,〈A, B, D, C, E〉,〈A, C, B,
life models having dozens of potentially (organizational perspective, time
D, E〉,〈A, C, D, B, E〉,〈A, D, B, C,
concurrent activities the resulting perspective, data perspective, etc.).
E〉,〈A, D, C, B, E〉}. L contains cases Moreover, discovery is just one of the
models are severely underfitting (i.e.,
that start with A and end with E. three types of process mining shown in
allow for too much behavior) and/or
Activities B, C, and D occur in any Fig. 3. The other two types of process
extremely complex if concurrency is not
order in-between A and E. The BPMN mining (conformance checking and
supported.
model in Fig. 6(a) shows a compact enhancement) heavily rely on the
As is illustrated by Fig. 6, it is
representation of the underlying relationship between elements in the
important to support at least the basic

8
model and events in the log. This emphasize the things relevant for a representation and fine-tune it for the
relationship may be used to "replay" particular type of user. Discovered intended audience. This is important for
the event log on the model. Replay models may focus on different visualizing results to end users and for
may be used to reveal discrepancies perspectives (control-flow, data flow, guiding discovery algorithms towards
between an event log and a model, time, resources, costs, etc.) and show suitable models (see also Challenge
e.g., some events in the log are not these at different levels of granularity C5).
possible according to the model. and precision, e.g., a manager may
Techniques for conformance checking want to see a coarse informal process GP6: Process Mining
quantify and diagnose such model focusing on costs whereas a
Should Be a Continuous
discrepancies. Timestamps in the event process analyst may want to see a
log can be used to analyze the detailed process model focusing on Process
temporal behavior during replay. Time deviations from the normal flow. Also Process mining can help to provide
differences between causally related note that different stakeholders may meaningful "maps" that are directly
activities can be used to add expected want to view a process at different connected to event data. Both historical
waiting times to the model. These levels: strategic level (decisions at this event data and current data can be
examples show that the relation level have long-term effects and are projected onto such models. Moreover,
between events in the log and elements based on aggregate event data over a processes change while they are being
in the model serves as a starting point longer period), tactical level (decisions analyzed. Given the dynamic nature of
for different types of analysis. at this level have medium-term effects processes, it is not advisable to see
In some cases it may be non-trivial and are mostly based on recent data), process mining as a one-time activity.
to establish such a relationship. For and operational level (decisions at this The goal should not be to create a
example, an event may refer to two level have immediate effects and are fixed model, but to breathe life into
different activities or it is unclear to based on event data related to running process models so that users and
which activity it refers. Such ambiguities cases). analysts are encouraged to look at
need to be removed in order to Second, it is useful to adopt ideas them on a daily basis.
interpret process mining results from cartography when it comes to Compare this to the use of
properly. Besides the problem of producing understandable maps. For mashups using geo-tagging. There are
relating events to activities, there is the example, road maps abstract from less thousands of mashups using Google
problem of relating events to process significant roads and cities. Less Maps (e.g., applications projecting
instances. This is commonly referred to significant things are either left out or information about traffic conditions,
as event correlation. dynamically clustered into aggregate real estate, fastfood restaurants, or
shapes (e.g., streets and suburbs movie showtimes onto a selected map).
GP5: Models Should Be amalgamate into cities). Cartographers People can seamlessly zoom in and out
not only eliminate irrelevant details, but
Treated as Purposeful using such maps and interact with them
also use colors to highlight important (e.g., traffic jams are projected onto the
Abstractions of Reality features. Moreover, graphical elements map and the user can select a
A model derived from event data have a particular size to indicate their particular problem to see details). It
provides a view on reality. Such a view significance (e.g., the sizes of lines and should also be possible to conduct
should provide a purposeful abstraction dots may vary). Geographical maps process mining based on real-time event
of the behavior captured in the event also have a clear interpretation of the data. Using the "map metaphor", we
log. Given an event log, there may be x-axis and y-axis, i.e., the layout of a can think of events having GPS
multiple views that are useful. map is not arbitrary as the coordinates coordinates that can be projected on
Moreover, the various stakeholders of elements have a meaning. All of this maps in real time. Analogous to car
may require different views. In fact, is in stark contrast with mainstream navigation systems, process mining
models discovered from event logs process models which are typically not tools can help end users (a) by
should be seen as "maps" (like using color, size, and location features navigating through processes, (b) by
geographic maps). This guiding to make models more understandable. projecting dynamic information onto
principle provides important insights, However, ideas from cartography can process maps (e.g., showing "traffic
two of which are described in the easily be incorporated in the jams" in business processes), and (c) by
remainder. construction of discovered process providing predictions regarding running
First of all, it is important to note maps. For example, the size of an cases (e.g., estimating the "arrival time"
that there is no such thing as "the map" activity can be used to reflect its of a case that is delayed). These
for a particular geographic area. frequency or some other property examples demonstrate that it is a pity to
Depending on the intended use there indicating its significance (e.g., costs or not use process models more actively.
are different maps: road maps, hiking resource use). The width of an arc can Therefore, process mining should be
maps, cycling maps, etc. All of these reflect the importance of the viewed as a continuous process
maps show a view on the same reality corresponding causal dependency, and providing actionable information
and it would be absurd to assume that the coloring of arcs can be used to according to various time scales
there would be such a thing as "the highlight bottlenecks. (minutes, hours, days, weeks, and
perfect map". The same holds for The above observations show that months).
process models: the model should it is important to select the right

9
• Logs may contain events at different
4. Challenges levels of granularity. In the event log
Process mining is an important tool of a hospital information system Challenges:
for modern organizations that need to events may refer to simple blood tests
manage non-trivial operational or to complex surgical procedures. C1: Finding, Merging, and
processes. On the one hand, there is an Also timestamps may have different Cleaning Event Data
incredible growth of event data. On the levels of granularity ranging from
other hand, processes and information milliseconds precision C2: Dealing with Complex Event
need to be aligned perfectly in order to (28-9-2011:h11m28s32ms342) to Logs Having Diverse
meet requirements related to coarse date information (28-9-2011). Characteristics
compliance, efficiency, and customer • Events occur in a particular context
service. Despite the applicability of (weather, workload, day of the C3: Creating Representative
process mining there are still important week, etc.). This context may explain Benchmarks
challenges that need to be addressed; certain phenomena, e.g., the
these illustrate that process mining is an response time is longer than usual C4: Dealing with Concept Drift
emerging discipline. In the remainder, because of work-in-progress or
we list some of these challenges. This holidays. For analysis, it is desirable C5: Improving the
list is not intended to be complete and, to incorporate this context. This Representational Bias Used for
over time, new challenges may emerge implies the merging of event data Process Discovery
or existing challenges may disappear with contextual data. Here the "curse
due to advances in process mining. of dimensionality" kicks in as analysis C6: Balancing Between Quality
becomes intractable when adding Criteria such as Fitness,
C1: Finding, Merging, and too many variables. Simplicity, Precision, and
Better tools and methodologies are Generalization
Cleaning Event Data needed to address the above problems.
It still takes considerable efforts to Moreover, as indicated earlier, C7: Cross-Organizational Mining
extract event data suitable for process organizations need to treat event logs
mining. Typically, several hurdles need as first-class citizens rather than some C8: Providing Operational
to be overcome: by-product. The goal is to obtain Support
• Data may be distributed over a ★★★★★ event logs (see Table 1).
variety of sources. This information Here, the lessons learned in the context C9: Combining Process Mining
needs to be merged. This tends to be of datawarehousing are useful to With Other Types of Analysis
problematic when different identifiers ensure high-quality event logs. For
are used in the different data example, simple checks during data C10: Improving Usability for Non-
sources. For example, one system entry can help to reduce the proportion Experts
uses name and birthdate to identify a of incorrect event data significantly.
person whereas another system uses C11: Improving Understandability
the person's social security number. C2: Dealing with Complex for Non-Experts
• Event data are often "object centric" Event Logs Having Diverse
rather than "process centric". For
example, individual products, pallets, Characteristics average number of events per case,
and containers may have RFID tags Event logs may have very different similarity among cases, the number of
and recorded events refer to these characteristics. Some event logs may be unique events, and the number of
tags. However, to monitor a extremely large making it difficult to unique paths. Consider an event log L1
particular customer order such object- handle them whereas other event logs with the following characteristics: 1000
centric events need to be merged are so small that not enough data is cases, on average 10 events per case,
and preprocessed. available to make reliable conclusions. and little variation (e.g., several cases
• Event data may be incomplete. A In some domains, mind-boggling follow the same or very similar paths).
common problem is that events do quantities of events are recorded. Event log L2 contains just 100 cases,
not explicitly point to process Therefore, additional efforts are but on average there are 100 events
instances. Often it is possible to needed to improve performance and per case and all cases follow a unique
derive this information, but this may scalability. For example, ASML is path. Clearly, L2 is much more difficult
take considerable efforts. Also time continuously monitoring all of its wafer to analyze than L1 even though the two
information may be missing for some scanners. These wafer scanners are logs have similar sizes (approximately
events. One may need to interpolate used by various organizations (e.g., 10,000 events).
timestamps in order to still use the Samsung and Texas Instruments) to As event logs contain only sample
timing information available. produce chips (approx. 70% of chips behavior, they should not be assumed
• An event log may contain outliers, are produced using ASML's wafer to be complete. Process mining
i.e., exceptional behavior also scanners). Existing tools have difficulties techniques need to deal with
referred to as noise. How to define dealing with the petabytes of data incompleteness by using an "open
outliers? How to detect such outliers? collected in such domains. Besides the world assumption": the fact that
These questions need to be answered number of events recorded there are something did not happen does not
to clean event data. other characteristics such as the mean that it cannot happen. This makes

10
it challenging to deal with small event complicated and few vendors support more demand" or "on Friday afternoon
logs with a lot of variability. exactly the same set of concepts. there are fewer employees available")
As mentioned before, some logs Processes are simply more complex or due to changing conditions (e.g.,
contain events at a very low abstraction than tabular data. "the market is getting more
level. These logs tend to be extremely Nevertheless, it is important to competitive"). Such changes impact
large and the individual low-level create representative benchmarks for processes and it is vital to detect and
events are of little interest to the process mining. Some initial work is analyze them. Concept drift in a
stakeholders. Therefore, one would like already available. For example, there process can be discovered by splitting
to aggregate low-level events into high- are various metrics for measuring the the event log into smaller logs and
level events. For example, when quality of process mining results (fitness, analyzing the "footprints" of the smaller
analyzing the diagnostic and treatment simplicity, precision, and logs. Such "second order" analysis
processes of a particular group of generalization). Moreover, several requires much more event data.
patients one may not be interested in event logs are publicly available (cf. Nevertheless, few processes are in
the individual tests recorded in the www.processmining.org). See for steady state and understanding concept
information system of the hospital's example the event log used for the first drift is of prime importance for the
laboratory. Business Process Intelligence Challenge management of processes. Therefore,
At this point in time, organizations (BPIC'11) organized by the task force additional research and tool support
need to use a trial-and-error approach (cf. doi:10.4121/ are needed to adequately analyze
to see whether an event log is suitable uuid:d9769f3d-0ab0-4fb8-803b-0d1120 concept drift.
for process mining. Therefore, tools ffcf54).
should allow for a quick feasibility test On the one hand, there should be C5: Improving the
given a particular data set. Such a test benchmarks based on real-life data
Representational Bias Used
should indicate potential performance sets. On the other hand, there is the
problems and warn for logs that are far need to create synthetic datasets for Process Discovery
from complete or too detailed. capturing particular characteristics. A process discovery technique
Such synthetic datasets help to develop produces a model using a particular
C3: Creating process mining techniques that are language (e.g., BPMN or Petri nets).
tailored towards incomplete event logs,
Representative Benchmarks However, it is important to separate the
noisy event logs, or specific populations visualization of the result from the
Process mining is an emerging of processes. representation used during the actual
technology. This explains why good Besides the creation of discovery process. The selection of a
benchmarks are still missing. For representative benchmarks, there also target language often encompasses
example, dozens of process discovery needs to be more consensus on the several implicit assumptions. It limits the
techniques are available and different criteria used to judge the quality of search space; processes that cannot be
vendors offer different products, but process mining results (also see represented by the target language
there is no consensus on the quality of Challenge C6). Moreover, cross- cannot be discovered. This so-called
these techniques. Although there are validation techniques from data mining "representational bias" used during the
huge differences in functionality and can be adapted to judge the result. discovery process should be a
performance, it is difficult to compare Consider for example k-fold checking. conscious choice and should not be
the different techniques and tools. One can split the event log in k parts. (only) driven by the preferred
Therefore, good benchmarks consisting k−1 parts can be used to learn a graphical representation.
of example data sets and process model and conformance Consider for example Fig. 6:
representative quality criteria need to checking techniques can be used to whether the target language allows for
be developed. judge the result with respect to the concurrency or not may have an effect
For classical data mining remaining part. This can be repeated k on both the visualization of the
techniques, many good benchmarks times, thus providing some insights into discovered model and the class of
are available. These benchmarks have the quality of the model. models considered by the algorithm. If
stimulated tool providers and
the representational bias does not
researchers to improve the C4: Dealing with Concept allow for concurrency (Fig. 6(a) is not
performance of their techniques. In the
case of process mining this is more Drift possible) and does not allow for
multiple activities having the same label
challenging. For example, the The term concept drift refers to the (Fig. 6(c) is not possible), then only
relational model introduced by Codd in situation in which the process is problematic models such as the one
1969 is simple and widely supported. changing while being analyzed. For shown in Fig. 6(b) are possible. This
As a result it takes little effort to convert instance, in the beginning of the event
example shows that a more careful and
data from one database to another and log two activities may be concurrent
refined selection of the representational
there are no interpretation problems. whereas later in the log these activities
bias is needed.
For processes such a simple model is become sequential. Processes may
missing. Standards proposed for change due to periodic/seasonal
process modeling are much more changes (e.g., "in December there is

11
undesirable to have a model that only organizations need to be merged. This
C6: Balancing Between
allows for the exact behavior seen in is a non-trivial task as events need to be
Quality Criteria such as the event log. Remember that the log correlated across organizational
Fitness, Simplicity, contains only example behavior and boundaries.
that many traces that are possible may Second, we may also consider the
Precision, and not have been seen yet. A model is setting where different organizations
Generalization precise if it does not allow for "too are essentially executing the same
much" behavior. Clearly, the "flower process while sharing experiences,
Event logs are often far from being
model" lacks precision. A model that is knowledge, or a common infrastructure.
complete, i.e., only example behavior
not precise is "underfitting". Underfitting Consider for example Salesforce.com.
is given. Process models typically allow
is the problem that the model over- The sales processes of many
for an exponential or even infinite
generalizes the example behavior in organizations are managed and
number of different traces (in case of
the log (i.e., the model allows for supported by Salesforce. On the one
loops). Moreover, some traces may
behaviors very different from what was hand, these organizations share an
have a much lower probability than
seen in the log). A model should infrastructure (processes, databases,
others. Therefore, it is unrealistic to
generalize and not restrict behavior to etc.). On the other hand, they are not
assume that every possible trace is
just the examples seen in the log. A forced to follow a strict process model
present in the event log. To illustrate
model that does not generalize is as the system can be configured to
that it is impractical to take complete
"overfitting". Overfitting is the problem support variants of the same process.
logs for granted, consider a process
that a very specific model is generated As another example, consider the basic
consisting of 10 activities that can be
whereas it is obvious that the log only processes executed within any
executed in parallel and a
holds example behavior (i.e., the municipality (e.g., issuing building
corresponding log that contains
model explains the particular sample permits). Although all municipalities in
information about 10,000 cases. The
log, but a next sample log of the same a country need to support the same
total number of possible interleavings in
process may produce a completely basic set of processes, there may be
the model with 10 concurrent activities
different process model). also be differences. Obviously, it is
is 10! = 3,628,800. Hence, it is
Balancing fitness, simplicity, interesting to analyze such variations
impossible that each interleaving is
precision and generalization is among different organizations. These
present in the log as there are fewer
challenging. This is the reason that most organizations can learn from one
cases (10,000) than potential traces
of the more powerful process discovery another and service providers may
(3,628,800). Even if there are millions
techniques provide various parameters. improve their services and offer value-
of cases in the log, it is extremely
Improved algorithms need to be added services based on the results of
unlikely that all possible variations are
developed to better balance the four cross-organizational process mining.
present. An additional complication is
competing quality dimensions. New analysis techniques need to
that some alternatives are less frequent
Moreover, any parameters used should be developed for both types of cross-
than others. These may be considered
be understandable by end-users. organizational process mining. These
as "noise". It is impossible to build a
techniques should also consider privacy
reasonable model for such noisy
behaviors. The discovered model needs
C7: Cross-Organizational and security issues. Organizations may
not want to share information for
to abstract from this; it is better to Mining
competitive reasons or due to a lack of
investigate low frequency behavior Traditionally, process mining is applied trust. Therefore, it is important to
using conformance checking. within a single organization. However, develop privacy-preserving process
Noise and incompleteness make
as service technology, supply-chain mining techniques.
process discovery a challenging
integration, and cloud computing
problem. In fact, there are four
competing quality dimensions: (a)
become more widespread, there are C8: Providing Operational
scenarios where the event logs of
fitness, (b) simplicity, (c) precision, and
multiple organizations are available for
Support
(d) generalization. A model with good analysis. In principle, there are two Initially, the focus of process mining
fitness allows for most of the behavior settings for cross-organizational process was on the analysis of historical data.
seen in the event log. A model has a mining. Today, however, many data sources
perfect fitness if all traces in the log can First of all, we may consider the are updated in (near) real-time and
be replayed by the model from collaborative setting where different sufficient computing power is available
beginning to end. The simplest model to analyze events when they occur.
organizations work together to handle
that can explain the behavior seen in
process instances. One can think of Therefore, process mining should not be
the log is the best model. This principle
such a cross-organizational process as restricted to off-line analysis and can
is known as Occam's Razor. Fitness and
a "jigsaw puzzle", i.e., the overall also be used for online operational
simplicity alone are not sufficient to process is cut into parts and distributed support. Three operational support
judge the quality of a discovered over organizations that need to activities can be identified: detect,
process model. For example, it is very cooperate to successfully complete predict, and recommend. The moment
easy to construct an extremely simple cases. Analyzing the event log within a case deviates from the predefined
Petri net ("flower model") that is able to one of these organizations involved is process, this can be detected and the
replay all traces in an event log (but insufficient. To discover end-to-end system can generate an alert. Often
also any other event log referring to
processes, the event logs of different one would like to generate such
the same set of activities). Similarly, it is

12
notifications immediately (to still be
able to influence things) and not in an business intelligence
off-line fashion. Historical data can be
used to build predictive models. These process intelligence
can be used to guide running process
instances. For example, it is possible to
process mining
predict the remaining processing time
of a case. Based on such predictions,
one can also build recommender (automated business) process discovery
systems that propose particular actions
to reduce costs or shorten the flow
time. Applying process mining
conformance checking
techniques in such an online setting
creates additional challenges in terms model enhancement
of computing power and data quality.

C9: Combining Process


Mining With Other Types
of Analysis
Operations management, and in
particular operations research, is a Figure 7: Relating the different terms.
branch of management science heavily
relying on modeling. Here a variety of
mathematical models ranging from humans to see patterns in unstructured the results should be presented using a
linear programming and project data. By combining automated process suitable representation (see also GP5).
planning to queueing models, Markov mining techniques with interactive visual Moreover, the trustworthiness of the
chains, and simulation are used. Data analytics, it is possible to extract more results should always be clearly
mining can be defined as "the analysis insights from event data. indicated. There may be too little data
of (often large) data sets to find to justify particular conclusions. In fact,
unsuspected relationships and to C10: Improving Usability existing process discovery techniques
summarize the data in novel ways that typically do not warn for a low fitness
are both understandable and useful to
for Non-Experts
or for overfitting. They always show a
the data owner". A wide variety of One of the goals of process mining is to model, even when it is clear that there
techniques have been developed: create "living process models", i.e., is too little data to justify any
classification (e.g., decision tree process models that are used on a daily conclusions.
learning), regression, clustering (e.g., k- basis rather than static models that end
means clustering) and pattern discovery up in some archive. New event data
(e.g., association rule learning). can be used to discover emerging Epilogue
Both fields (operations behavior. The link between event data
The IEEE Task Force on Process Mining
management and data mining) provide and process models allows for the
aims to (a) promote the application of
valuable analysis techniques. The projection of the current state and
process mining, (b) guide software
challenge is to combine the techniques recent activities onto up-to-date models.
developers, consultants, business
in these fields with process mining. Hence, end-users can interact with the
managers, and end-users when using
Consider for example simulation. results of process mining on a day-to-
state-of-the-art techniques, and (c)
Process mining techniques can be used day basis. Such interactions are very
stimulate research on process mining.
to learn a simulation model based on valuable, but also require intuitive user
This manifesto states the main principles
historical data. Subsequently, the interfaces. The challenge is to hide the
and intentions of the task force. After
simulation model can be used to sophisticated process mining algorithms
introducing the topic of process mining,
provide operational support. Because behind user-friendly interfaces that
the manifesto catalogs some guiding
of the close connection between event automatically set parameters and
principles (Section 3) and challenges
log and model, the model can be used suggest suitable types of analysis.
(Section 4). The guiding principles can
to replay history and one can start
be used in order to avoid obvious
simulations from the current state thus C11: Improving mistakes. The list of challenges is
providing a "fast forward button" into Understandability for Non- intended to direct research and
the future based on live data.
Similarly, it is desirable to combine Experts development efforts. Both aim to
increase the maturity level of process
process mining with visual analytics. Even if it is easy to generate process mining.
Visual analytics combines automated mining results, this does not mean that To conclude, a few words on
analysis with interactive visualizations the results are actually useful. The user terminology. The following terms are
for a better understanding of large and may have problems understanding the used in the process mining space:
complex data sets. Visual analytics output or is tempted to infer incorrect workflow mining, (business) process
exploits the amazing capabilities of conclusions. To avoid such problems, mining, automated (business) process
13
discovery, and (business) process
intelligence. Different organizations Glossary
This Manifesto was originally
seem to use different terms for Activity: a well-defined step in the published in “Business Process
overlapping concepts. For example, process. Events may refer to the start, Management Workshops 2011,
Gartner is promoting the term completion, cancelation, etc. of an Lecture Notes in Business
"Automated Business Process activity for a specific process instance. Information Processing, Vol. 99,
Discovery" (ABPD) and Software AG is Automated Business Process Discovery: Springer-Verlag, 2011, and has
using "Process Intelligence" to refer to see Process Discovery. been translated into various
their controlling platform. The term Business Intelligence (BI): broad languages. See the home page
"workflow mining" seems less suitable collection of tools and methods that use of IEEE Task Force on Process
as the creation of workflow models is data to support decision making. Mining: http://www.win.tue.nl/
just one of the many possible Business Process Intelligence: see ieeetfpm/ for more information.
applications of process mining. Process Intelligence.
Similarly, the addition of the term Business Process Management (BPM):
"business" narrows the scope to certain the discipline that combines knowledge
applications of process mining. There Model Enhancement: one of the three
from information technology and
are numerous applications of process basic types of process mining. A
knowledge from management sciences
mining (e.g., analyzing the use of high- process model is extended or improved
and applies both to operational
tech systems or analyzing websites) using information extracted from some
business processes.
where this addition seems to be log. For example, bottlenecks can be
Case: see Process Instance.
inappropriate. Although process identified by replaying an event log on
Concept Drift: the phenomenon that
discovery is an important part of the a process model while examining the
processes often change over time. The
process mining spectrum, it is only one timestamps.
observed process may gradually (or
of the many use cases. Conformance MXML: an XML-based format for
suddenly) change due to seasonal
checking, prediction, organizational exchanging event logs. XES replaces
changes or increased competition, thus
mining, social network analysis, etc. are MXML as the new tool-independent
complicating analysis.
other use cases that extend beyond process mining format.
Conformance Checking: analyzing
process discovery. Operational Support: on-line analysis of
whether reality, as recorded in a log,
Figure 7 relates some of the terms event data with the aim to monitor and
conforms to the model and vice versa.
just mentioned. All technologies and influence running process instances.
The goal is to detect discrepancies and
methods that aim at providing Three operational support activities can
to measure their severity. Conformance
actionable information that can be used be identified: detect (generate an alert
checking is one of the three basic types
to support decision making can be if the observed behavior deviates from
of process mining.
positioned under the umbrella of the modeled behavior), predict (predict
Cross-Organizational Process Mining:
Business Intelligence (BI). (Business) future behavior based on past
the application of process mining
process intelligence can be seen as the behavior, e.g., predict the remaining
techniques to event logs originating
combination of BI and BPM, i.e., BI processing time), and recommend
from different organizations.
techniques are used to analyze and (suggest appropriate actions to realize
Data Mining: the analysis of (often
improve processes and their a particular goal, e.g., to minimize
large) data sets to find unexpected
management. Process mining can be costs).
relationships and to summarize the
seen as a concretization of process Precision: measure determining whether
data in ways that provide new insights.
intelligence taking event logs as a the model prohibits behavior very
Event: an action recorded in the log,
starting point. (Automated business) different from the behavior seen in the
e.g., the start, completion, or
process discovery is just one of the event log. A model with low precision is
cancelation of an activity for a
three basic types of process mining. "underfitting".
particular process instance.
Figure 7 may be a bit misleading in the Process Discovery: one of the three
Event Log: collection of events used as
sense that most BI tools do not provide basic types of process mining. Based on
input for process mining. Events do not
process mining functionality as an event log a process model is
need to be stored in a separate log file
described in this document. The term BI learned. For example, the α algorithm
(e.g., events may be scattered over
is often conveniently skewed towards a is able to discover a Petri net by
different database tables).
particular tool or method covering only identifying process patterns in
Fitness: a measure determining how
a small part of the broader BI collections of events.
well a given model allows for the
spectrum. Process Instance: the entity being
behavior seen in the event log. A
There may be commercial reasons handled by the process that is
model has a perfect fitness if all traces
for using alternative terms. Some analyzed. Events refer to process
in the log can be replayed by the
vendors may also want to emphasize a instances. Examples of process
model from beginning to end.
particular aspect (e.g., discovery or instances are customer orders,
Generalization: a measure determining
intelligence). However, to avoid insurance claims, loan applications, etc.
how well the model is able to allow for
confusion, it is better to use the term Process Intelligence: a branch of
unseen behavior. An "overfitting"
"process mining" for the discipline Business Intelligence focusing on
model is not able to generalize
covered by this manifesto. Business Process Management.
enough.

14
Process Mining: techniques, tools, and
methods to discover, monitor and
improve real processes (i.e., not
assumed processes) by extracting
knowledge from event logs commonly
available in today's (information)
systems.
Representational Bias: the selected
target language for presenting and
constructing process mining results.
Simplicity: a measure operationalizing
Occam's Razor, i.e., the simplest model
that can explain the behavior seen in
the log, is the best model. Simplicity
can be quantified in various ways, e.g.,
number of nodes and arcs in the model.
XES: is an XML-based standard for
event logs. The standard has been
adopted by the IEEE Task Force on
Process Mining as the default
interchange format for event logs
(cf. www.xes-standard.org).

Wil van der Aalst Pavlos Delias Donato Malerba Giovanni Stilo
Authors Arya Adriansyah Boudewijn van Ronny Mans Casper Stoel
Ana Karla Alves de Dongen Alberto Manuel Keith Swenson
Medeiros Marlon Dumas Martin McCreesh Maurizio Talamo
Franco Arcieri Schahram Dustdar Paola Mello Wei Tan
Thomas Baier Dirk Fahland Jan Mendling Chris Turner
Tobias Blickle Diogo R. Ferreira Marco Montali Jan Vanthienen
Jagadeesh Chandra Walid Gaaloul Hamid Motahari George Varvaressos
Bose Frank van Geffen Nezhad Eric Verbeek
Peter van den Brand Sukriti Goel Michael zur Muehlen Marc Verdonk
Ronald Brandtjen Christian Günther Jorge Munoz-Gama Roberto Vigo
Joos Buijs Antonella Guzzo Luigi Pontieri Jianmin Wang
Andrea Burattin Paul Harmon Joel Ribeiro Barbara Weber
Josep Carmona Arthur ter Hofstede Anne Rozinat Matthias Weidlich
Malu Castellanos John Hoogland Hugo Seguel Pérez Ton Weijters
Jan Claes Jon Espen Ingvaldsen Ricardo Seguel Pérez Lijie Wen
Jonathan Cook Koki Kato Marcos Sepúlveda Michael Westergaard
Nicola Costantini Rudolf Kuhn Jim Sinur Moe Wynn
Francisco Curbera Akhil Kumar Pnina Soffer
Ernesto Damiani Marcello La Rosa Minseok Song
Massimiliano de Leoni Fabrizio Maggi Alessandro Sperduti

15

Das könnte Ihnen auch gefallen