Beruflich Dokumente
Kultur Dokumente
Data Description
and Visualization
Contents
1 Introduction
5 Reporting
Information needs
- Data description and visualization for business processes:
• Structure and usage
• Production and organization perspective
• Event view
• Static and dynamic
- Data description and visualization for collections of business process
instances:
• Observation over certain period of time
• Customer perspective
• Cross-sectional and state view
- Data description and visualization for reporting:
• High-level reports on all BI activities
• Put into context of business goals
1 Introduction
5 Reporting
2https://www.ariscommunity.com/event-driven-process-chain
3http://www.yawlfoundation.org/yawlbook/table-of-contents.html
4Stefanie Rinderle, Manfred Reichert, Peter Dadam: Correctness criteria for dynamic changes in workflow systems - a
survey. Data Knowl. Eng. 50(1): 9-34 (2004)
5cpee.org
Further aspects:
- Visualization of process perspectives:
• Control flow: as discussed before (à production)
Organizational models
- Typical elements
• Roles
• Organizational units
• Actors
- Typical relations
• Actor has role
• Role 1 is specialized with respect to role 2
• Role belogs to organizational unit
• Organizational unit 1 is subordinated to organizational unit 2
- Visualizations as graph, table, list
- Using existing approaches the graph can get quite big
Challenges:
- Wallpaper processes on limited screen size
• Views6 and abstraction7
- Large number of process instances:
• Selection or multimodal approaches, e.g., sonification8
- Visualizing change information:
• Change tracking9 or change trees10
6Ralph Bobrik, Manfred Reichert, Thomas Bauer: View-Based Process Visualization. BPM 2007: 88-95
7Sergey Smirnov, Hajo A. Reijers, Mathias Weske, Thijs Nugteren: Business process model abstraction: a definition, catalog,
and survey. Distributed and Parallel Databases 30(1): 63-99 (2012)
8Tobias Hildebrandt, Thomas Hermann, Stefanie Rinderle-Ma: Continuous sonification enhances adequacy of interactions in
1 Introduction
5 Reporting
Definition of data
- Which process instances? Depending on instance attributes, e.g.,
• time interval of interest
• customer
- Cross-sectional and state view
- Event view à Chapter 7
- Which attributes?
• Depends on analysis question, e.g., cargo temperature in logistic
process
• Also: data transformations for existing attributes, for example, in the
state view summary characteristics of times series from each instance,
first-order differences
Data structures:
- Multidimensional (pivot) tables, defined by:
• Values of qualitative variables (dimensions)
• Summary attribute for the cells (see also multidimensional data
structures for data warehouses in Chapter 3)
- For process instances:
• Simple matrix with rows representing process instances and columns
representing variable values; possibly nested
• Complex structures for cross-sectional and state view; here the
attributes refer to a sequence of values, together with the temporal
information
Mapping:
- Defines for each variable how it is represented in the graphics
- Basic aesthetic attributes:
• Axis
• Color
• Size
• Shape
- Quantitative variable à axis
- Qualitative variable à shape
- Scale: mapping to aesthetic attributes
Definition of layers
- Specification of statistical transformations, e.g.,
• Identity transformation: display variable values
• Summary transformation: calculate univariate characteristics, e.g.,
mean, median
• Transformation for histogram: define bins and count observations
• Calculation of regression line
- Transformations are represented using geometric objects, e.g.,
points, lines, intervals, polygons
- Geometric objects are mapped ot aesthetic variables
- Avoid overplotting by position specification or jittering
Coordinate system:
- Defines location of points in space
- Examples: Cartesian or polar coordinate systems
Facets:
- Bind together different graphical displays
- Display aspects under different conditions
- Alternative to putting everything in one graphics using different
aesthetic attributes
- See also conditioning plots or trellis plots
Dynamic graphics
- Additional elements:
• Rotating axis
1 Introduction
5 Reporting
Qualitative Information
- Data structure: pivot table providing frequencies of value combinations
for different attributes (absolute, percentage)
- Bar charts and pie charts
• One variable, absolute: bar chart
• One variable, relative: bar chart, pie chart
• Multiple variables: stacked or clustered bar chart
R package ggplot2
ÓW.Grossmann,
© 2015 Springer-Verlag
S. Rinderle-Ma,Berlin Heidelberg
University of Vienna – Chapter 4: Data Description and Visualization 29
4 Basic visualization techniques
Qualitative Information
- Mosaic plots
• Two or more variables
• All data represented as square
• Horizontal edge is split according to the proportions of the first variable;
resulting retangles correspond to relative frequencies
• Then each rectangle is divided along conditional probability of the
second variable given the value of the first one
• For further variables alternating split of the rectangles along horizontal
and vertical axes based on condional probabilities
• Result: each rectangle represents to the frequency of occurrence of
that particular combination of variables
R package treemap
Qualitative Information
- Tree maps
• Values represent nested hierarchy of groups by nested rectangles
• Additional attributes represented by colors
Interpretation:
Ø 21 outlets in 5 regions
Ø Sales in regions and outlets
represented by size of
rectangles
Ø Example: “dominant“
Outlet1_4 in Region1
© 2015 Springer-Verlag Berlin Heidelberg
ÓW.Grossmann, S. Rinderle-Ma, University of Vienna – Chapter 4: Data Description and Visualization 32
4 Basic visualization techniques
Quantitative Information
- Histogram
• Value range of variable is divided into non-overlapping classes, so
called bins
• Number of observations per bin is counted and displayed by heights of
bars per bin
• Absolute
• Relative
• Density: area of the bars corresponds to relative frequency of the bin
• Density estimates: possibly transformation, e.g., logarithmic
Quantitative Information
- Boxplots
• Represent the distribution of a quantitative variable
• Often used for displaying value distributions of different groups, e.g.,
age groups
• 25% and 75% quantiles define the box of the 50% most frequent
observations
• Whiskers define the mark the are where all the values should lie when
following a normal distribution
• Values outside the whiskers are considered outliers, deserve special
attention
R graphics
R package
ggplot
R package corrplot
Relationships
- Scatter plots
• Represent the relationships between k variables based on
𝑘 ∗ (𝑘 − 1)/2 plots in a scatter plot matrix
• Additional layers:
• Smoothing curves showing the relationship between the variables
(à Chapter 5)
• If qualitative variable is used for grouping, colors can represent the
different groups
• Frequency distributions in the diagonal of the matrix
Interpretation:
Ø all frequency distributions are
skewed to the right
Ø for all plots: linear trend line and
a smoothed trend line
Ø positive relationship between
average sales and actual sales
Ø relationship is rather scattered
for larger sales
Ø almost no correlation between
the duration of the customer
relationship and sales
Ø for larger average sales there
seems to be almost no
relationship
© 2015 Springer-Verlag Berlin Heidelberg
Relationships
- Projections and Principal Components:
• Representation of multivariate data in two or three dimensions
• Given variables X1, ..., Xk, for each Xi a principle component PCi is defined as
follows:
• 𝑃𝐶𝑖: = ∑;<=,..,> 𝛼𝑖𝑗𝑋𝑗, i.e., PCi is a linear combination of X1, ..., Xk
• For the first variable, the coefficients are determined in such a way that PC1 explains
as much as possible from the overall variance of the observations.
• Given PC1, ..., PCi, PCi+1 is defined orthogonal to PC1, ..., PCi and explains as much as
possible from the overall variance of the observations
• Typically, PC1, PC2 represent 80% of the variability in the data
• Scatter plot of PC1, PC2
• Biplot: displays the observation points as well as the variables in the coordinate
system defined by the first two principal components.
Interpretation:
Ø The first principal component
accounts for 74% of the
variability, the second one for
12%
Ø Helpfulness and Competence
are evaluated similarly
Ø Eco-friendliness is evaluated
differently
Temporal data
- Use time-independent summaries such as mean and display them using
visualization techniques as discussed before, e.g., boxplots
- Or visualize the state variable for each process instance as a function of
time
Interpretation:
• One curve per woman
• For women going to hospital
Proteinurea has steeply
increased between days 48 and
100
• Different from women not going
to hospital
1 Introduction
5 Reporting
Interpretation:
• Dataset1 seems to be
reliable, relevant, and
consistent, but less
complete, and even less
accurate, and timely.
• Dataset2 shows the opposite
picture: it is seemingly
complete, accurate, and
timely, but lacks relevance
and consistency
© 2015 Springer-Verlag Berlin Heidelberg
Interpretation: Interactive
High-level reporting dashboard on student
performance
- Dashboards and Business Cockpits
- (Graphical) summaries for non-experts HighCharts
High-level reporting
- Balanced scorecard
• Components:
• Destination statement describes the organization at present and at
a defined point in the future (mid-term planning) in the four
perspectives: financial and stakeholder expectations, customer and
external relationships, processes activities, and organization and
culture.
• Strategic linkage model contains strategic objectives with respect
to outcome and activities, together with hypothesized causal
relationships between these strategic objectives.
• Definitions of strategic objectives
• For each strategic objective measures are defined, together with
their targets.
ÓW.Grossmann, S. Rinderle-Ma, University of Vienna – Chapter 4: Data Description and Visualization 51
5 Reporting
Infographics
- Designed to convey possibly complex information
- Goals:
• Appeal: An infographic should engage the intended audience.
• Comprehension: The viewer of an infographic should understand the
information easily.
• Retention: The information provided by an infographic should be
remembered by the viewer.
- Example: maps for public transport, Pinterest
- Tools: Piktochart (open source), ManyEyes, Tableau Public,
Gapminder.
1 Introduction
5 Reporting