Sie sind auf Seite 1von 36

CRYPTACUS CONFERENCE - INVITED TALK – SEPTEMBER 2018 – RENNES (FR)

An Introduction to Privacy-
Preserving Process Mining:
A Study in the Healthcare Sector
C A C

R T U

Y P S
Agusti Solanas
Smart Health Research Group
Dpt. Computer Engineering & Maths
agusti.solanas@urv.cat Rovira i Virgili University
@AgustiSolanas Tarragona, Catalonia, Spain

Cryptacus Conference - Invited Talk – Rennes, France September 20th , 2018


Outline
1. Introduction to Process Mining
2. Process Mining in Healthcare
3. Privacy-Preserving Process
Mining
4. Case studies in Healthcare
5. Conclusions

Process Case Discussion &


Mining Studies Conclusions

Privacy-
Process
Preserving
Mining in
Process
Healthcare
Mining

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Introduction to Process Mining
A World of Data

• We live in a world in which data are generated and gathered


at unprecedented speed

• Estimations say that the amount of digital information in


2014 exceeded 4 Zettabytes

In standard units 1 Zettabyte is 1021 bytes


= 1,000,000,000,000,000,000,000 bytes
= 1,000,000,000,000,000, Mbytes
= 1,000,000,000,000, Gbytes
= 1,000,000,000, Tbytes

If we assume binary prefixes, then 1 Megabyte is 220 = 1048576 bytes,


and 1 Zettabyte is 270 ≈ 1.18 × 1021 bytes

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Introduction to Process Mining
The Internet of Things (IoT)
• The Internet of Things (IoT) is the result of the interconnection of all
devices/objects (able to communicate).
• We can connect anything
• … and even parts of anything

• These connected devices are able to collect,


send and receive data
• to the cloud
• to private servers
• to our desktop computer or mobile phone

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Introduction to Process Mining
From the IoT to the IoE

• Everyone knows about the IoT but…


– Do you know what the Internet of Events is?

Image from Process Mining. Data Science in Action. Wil van der Aalst. Springer

• The term Internet of Events (IoE) refers to all event data available.
W.M.P. van der Aalst. Data Scientist: The Engineer of the Future. In K. Mertins, F. Benaben, R. Poler, and J. Bourrieres, editors,
Proceedings of the I-ESA Conference, volume 7 of Enterprise Interoperability, pages 13–28. Springer, Berlin, 2014.

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


12
Introduction to Process Mining
Basics on Process Mining

• Process mining is a research field aiming at discovering, monitoring and


improving real business processes by extracting knowledge from the event
logs available in organizational information systems
(Van der Aalst)

• Advantages:
– Optimization of resources
– Identification of bottlenecks
– Detection of hidden dependencies
– Better decision-makings for future improvements

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Process Mining in Healthcare
The Internet of Medical Things (IoMT)
• When we talk about IoT in the medical sector we refer to the Internet of
Medical Things (IoMT).
• Similarly to the previously shown scheme for IoT,
IoMT allows the interconnection of medical devices,
namely ECG, pills dispensers, thermometers, etc,
and also other equipment and personnel.
• It opens the door to better treatments & services,
for example:
• Authentication
• Tracking

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Process Mining in Healthcare
The importance of e-Health
• The evolution of healthcare from a technological perspective

1st Personal 1st Mobile 1st Definition


Desktop Handheld of Smart City
Computer Phone

TCP/IP
becomes a
standard

2009
1964

1973

1982

2014
2001

2006
First accepted First accepted First accepted
definition of definition of definition of
Electronic Mobile Smart
Healthcare Healthcare Healthcare

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Introduction: Smart Healthcare
• The idea of Smart Healthcare was inspired by previous electronic and mobile
healthcare paradigms and the rise of smart cities.

“Smart healthcare is the provision of health services by using the context-aware network and sensing
infrastructure of smart cities.”
[Solanas et al. 2014]

• Although the first definition of Smart • Smart healthcare is the combination


Healthcare was inspired by Smart of several ingredients:
Cities. The current use of the term has
been generalized to context-aware
environments Data
Mobility
Data
from the
context
“Smart healthcare is the provision of health services founded on the
Patient
use of networking capabilities and sensing infrastructures of context- Data

aware environments.”
Generalized from [Solanas et al. 2014]

Smart Healthcare
CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018
28
Process Mining in Healthcare
Motivation

• Why should we use Process Mining in Healthcare?

There are plenty of event log


Most healthcare activities can
files in healthcare information
be modelled as processes
systems

It’d be good to identify


Many healthcare processes are
bottlenecks, provide insights to
theoretically designed, but the
managers, anticipate
reality may differ
problems…

Manage processes to achieve


cost-efficient and sustainable
healthcare models

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


29
Process Mining in Healthcare
What are researchers doing in PM for HC?

Academic databases
Query

“process mining” AND (healthcare OR clinical OR medical)


*In title, abstract or keywords

1. Duplicated articles
Exclusion
criteria

2. Master & PhD thesis


3. Barely related articles after careful reading
Result

55 articles

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


30
Process Mining in Healthcare
What are researchers doing in PM for HC?

General features Healthcare features

Objective of the analysis Medical facilities

Medical field
Type
Process
mining Medical process type
dimensions
Perspective
Medical data

Algorithms & Tools

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Process Mining in Healthcare
What are researchers doing in PM for HC?
• Privacy is paramount in the healthcare sector because it is a very sensitive
area.
• Privacy is understood differently, before
the adoption of ICT and IoT, and after.
• The adoption of ICT and, more specifically,
IoT opens the door to serious attacks (to
name a few): DoS
• Electronic Healthcare Records robbery
• Impersonation
• Tamper with data
• DoS Attacks
er y
bb

Tamper with
Ro
on

data
ati
on
p ers
Im

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


44
Privacy-Preserving Process Mining
Why do I need it?

• Healthcare event log files may contain personal data


– Especially, sensitive data (patients/doctors identifiers, health
conditions, treatments, diseases…)

• We need careful management to guarantee individuals privacy

• A well-known solution is to Distort data to prevent the disclosure of


sensitive data and avoid re-identification or personal data disclosure
– Especially when realising data files to 3rd parties
• Research
• Statistics
• Etc..

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• Individual data must be protected but they should be also useful


• Data should be modified/distorted to achieve both goals
• Utility
• Confidentiality

+ +
The equilibrium point
Confidentiality

Utility
- -
Data encryption No modifications

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• Utility is usually measured using the information loss (IL) and


the MAE

• Confidentiality is assessed by means of the re-identification


risk

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• These metrics work if you have data sets with independent records

• And distortions could be added using a variety of techniques


• Noise addition
• Microaggregation
• Generalisation

• The goal is to introduce enough ”noise” to avoid the disclosure of


personal information.

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• In general, data are aggregated in groups of k elements, then the


average value of the group is calculated and released
• Microaggregation works in two
stages:
1. The set of records in a dataset is
clustered in such a way that:
i. each cluster contains at
least k records;
ii. records within a cluster are
as similar as possible.
2. Records within each cluster are
replaced by a representative of
the cluster, typically the
centroid record (i.e. the average
of the cluster).
CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018
Privacy-Preserving Process Mining
Why is it different from classical SDC?

• By applying the microaggregation technique we create data sets that


satisfy the k-anonymity property.

• Each cluster of k records generates an identical combination of quasi-


identifier attribute values.

• From the adversary’s point of view, the probability to successfully


linking his target record is never greater than 1/k.

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• These metrics and techniques work if applied on data sets with


independent records
• However, log files – especially those used in process mining – are not
independent.
• Most of the times they are time dependant
• Are grouped by case (particularly in the healthcare domain)

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Why is it different from classical SDC?

• So, we can no longer consider records (as in SDC)


• But, sets of records
• Defining processes
• Those that we want to study

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity

• The question is:


• How do we compute the distance between two sets of events?

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity

• Well, the answer is… we don’t.


• In PPPM what we need to know is how different processes are,
so…
• We mine processes for each case and compare those.

C
o
Process
Mining m
p
a
Process
r
Mining e

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity

• To compare two processes and analyze their (dis)similarity we consider metrics


based on graph dissimilarity

• Also, we use those metrics to compare original graphs and distorted/anonymized


graphs

• We have used 6 metrics


– Vertex/edge overlap [Papadimitrou2010]
– Vertex ranking [Papadimitrou2010]
– DeltaCon [Koutra2013]
– Weight distance [Shoubridge2002]
– Topological-based Graph Edit Distance [Dickinson2003]
– Traffic-based Graph Edit Distance [Dickinson2003]

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity
• Vertex/edge overlap [Papadimitrou2010]: “two graphs are more similar
if they share more vertices and/or edges”

! ∩ ! ' + " ∩ "'


!"#(%, % ' ) = 2
! + ! ' + " + "'

• Vertex ranking [Papadimitrou2010]: “two graphs are more similar if the


rankings/quality of their vertices are more similar”

2 ∑ 6 ×(8 − 8 ' )9
1∈3∪3 5 1 1 1
!- %, % ' = 1 −
:
where 81 − 81' are ranks of ;, 61 is the quality of ;, and : is a
normalization factor

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity
• Topological-based Graph Edit Distance [Dickinson2003]: “two graphs are more similar
if the number of nodes and/or edges changes is small”

!"#$%& $, ( = *+ + *- − 2 *+ ∩ *- + %+ + %- − 2 %+ ∩ %-

• Traffic-based Graph Edit Distance [Dickinson2003] “two graphs are similar if there are
few nodes and/or edges weights changes”

!123$%& $, ( = 4 *+ + *- − 2 *+ ∩ *-
+ 5 ;+ < − ;- <
6∈89 ∩8:

+ 5 ;+ (<)
6∈89 \(89 ∩8: )

+ 5 ;- (<)
6∈8: \(89 ∩8: )

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity
• Weight distance [Shoubridge2002]: “two graphs are similar if their edges weights
are similar”
( *
+,
23 4, 5 − 2 3 4, 5
!" #, % = '( ∪ '* -
max{23( 4, 5 , 23* 4, 5 }
.,/∈1

• DeltaCon [Koutra2013]: “two graphs are similar if the influence of each edge is
similar”
1
"<=>?@AB #, % =
1 + ∑HFG, ∑HIG,( K(,FI − K*,FI )M

where KFI is the relative importance of edge(i,j)

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
An initial solution based on k-anonymity

• With those metrics we can create a dissimilarity matrix considering


all cases.
• And Clusterize cases using (for example) the Maximum Distance to
Average Vector algorithm.
Clustering algorithm

d11 Compute the average distance vector V


d21 d22 Find the most distant vector D to V
. . . . . .
Find the most distant vector T to D
.
.
Build a cluster C(D) around D
. Build a cluster C(T) around T
. Aggregate elements in C(D)
dn1 dnn Aggregate elements in C(T)
Repeat until there are less than k vectors

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Case Studies in Heathcare
Our Dataset

• Real event logs from hospital


• Request of documents/tests that doctors made during patients treatments
– eventID | episodeID/case | doctorID | patientID | action | timestamp
• Summary
– 122.179 events
– 53.836 episodes (treatments)
– 280 doctors
– 34.075 patients
– 36 actions
– From 08/06/2015 to 22/01/2016

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Case study

• GOAL: Cluster doctors with similar graphs

• First, we create a distance matrix D for all doctors using Topological-


based Graph Edit Distance metric

• Second, we clusterize the distance matrix using MDAV


– Define cluster size k
– We obtain a cluster of k similar doctors

• For each cluster, we create an a representative (average).

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Initial results based on k-anonymity
• The behaviour of each metric is similar regarding the cluster size k
• For small cluster sizes, the graph distortion is smaller rather than
higher k values
– In fact, for k=10, the graphs quality is already bad

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Privacy-Preserving Process Mining
Initial results based on k-anonymity
Original

k=3
k=2

k=4 k=5

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


69
Discussion and Conclusions
• We have stated that:

– Process mining is an emerging field with high potential

– Preserving privacy of individuals in log files is paramount


• Careful management of sensitive information (third parties)
• Compliance with EU GDPR

– This is specially important in the healthcare field


• Very sensible data

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Discussion and Conclusions
Future research lines

• Data reduction with tensor structures


• Learning strategies for better clustering
– Learning with unbalanced datasets
– Deep Learning
– Transfer Learning
• New aggregation techniques
– Random sampling
– Categorical averaging with nominal variance
• Apply other privacy-preserving techniques (e.g. generalization of doctors
according to a hierarchy)
• Exhaustive analysis of much more graph measures (centrality, connectivity,
distance…)

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


Bibliography
• [Becker et al.2015] Becker S, Brandl C, Meister S, Nagel E, Miron-Shatz T, Mitchell A, et al. (2015) Demographic and Health Related Data
of Users of a Mobile Application to Support Drug Adherence is Associated with Usage Duration and Intensity. PLoS ONE 10(1):
e0116980. doi:10.1371/journal.pone.0116980

• [Casino et al. 2015] F. Casino et al., “A k-anonymous approach to privacy preserving collaborative filtering” Journal of Computer and
Systems Sciences. 81(6): 1000-1011 (2015)

• [Eysenback.2001] G. Eysenback, What is e-health? J Med Internet Res 2001;3(2):e20. doi:10.2196/jmir.3.2.e20

• [Istepanian et al.2006] R. Istepanian, S. Laxminarayan, and C. S. Pattichis, “Preface,” M-Health: Emerging Mobile Health Systems, Topics
in Biomedical Engineering, Int’l. Book Series, Springer.2006

• [Pérez et al.2013] P. Pérez, A. Martínez, and A. Solanas, “Privacy in Smart Cities -A Case Study of Smart Public Parking,” Proc. 3rd Int’l
Conf. PECCS, 2013, pp. 55–59

• [Solanas et al. 2014] “Smart Health: A Context-Aware Health Paradigm within Smart Cities” IEEE Communications Magazine. August,
2014

CRYPTACUS Conference - Invited Talk – Rennes, France September 20th , 2018


CRYPTACUS CONFERENCE - INVITED TALK – SEPTEMBER 2018 – RENNES (FR)

An Introduction to Privacy-
Preserving Process Mining:
A Study in the Healthcare Sector
C A C

R T U

Y P S
Agusti Solanas
Smart Health Research Group
Dpt. Computer Engineering & Maths
agusti.solanas@urv.cat Rovira i Virgili University
@AgustiSolanas Tarragona, Catalonia, Spain

Cryptacus Conference - Invited Talk – Rennes, France September 20th , 2018

Das könnte Ihnen auch gefallen