Beruflich Dokumente
Kultur Dokumente
data
Background
What is Big Data?
PARC | 2
The Dream:
Data that leads to Actionable Insights
Data
XIX
IVA
N4A
XIX
XIX
N4A
N4A
IVA
N4A
IVA
IVA
N4A
N4A
N4A
AAO
N4A
AAO
N4A
XIX
XIX
AAO
N4A
N4A
AAO
IVA
XIX
N4A
XIX
N4A
IVA
F4A
C4A
F4A
F4A
F4A
F4A
ONA
ONA
F4A
C4A
C4A
ONA
ONA
F4A
F4A
F4A
F4A
ONA
F4A
MNA
F4A
F4A
F4A
F4A
C4A
F4A
F4A
ONA
F4A
C4A
F4A
C4A
F4A
F4A
F4A
F4A
ONA
ONA
F4A
C4A
C4A
ONA
ONA
F4A
F4A
F4A
F4A
ONA
F4A
MNA
C4A
F4A
F4A
F4A
C4A
F4A
F4A
ONA
F4A
C4A
1
4
2
3
3
2
2
1
1
5
1
2
1
1
3
1
3
1
2
2
2
2
1
1
1
1
1
3
1
1
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
1 CPR
0 CPR
0 WNG
0 CPR
0 NCP
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
1 CPR
1 CPR
0 CPR
1 CPR
0 CPR
0 CPR
0 CPR
0 WNG
Analysis
20 FEM
17 FEM
31 FEM
20 FEM
34 FEM
29 FEM
20 FEM
19 FEM
18 FEM
18 FEM
20 FEM
21 FEM
30 FEM
16 FEM
31 FEM
16 FEM
17 FEM
36 FEM
32 FEM
15 FEM
19 FEM
18 FEM
22 FEM
16 FEM
23 FEM
20 FEM
14 FEM
20 FEM
21 FEM
21 FEM
Bill James
Baseball
Player
Data
Small market
teams effectively
compete with
large market
teams.
PARC | 3
The Reality:
Disparate Data, Vague Goals and Organizational Inertia
What do
we want
to do
again?
Get results
of course!
Various
Databases
Disparate Data
Paper
Restricted
Access
LOGS
The
Results
speak for
themselves
Vague Goals
Everyone
will love
them.
Mainframes
Sensors
Organizational Inertia
PARC | 4
The Solution:
Understand your data, goals and organization
Understand your data:
Using insights from data often requires people change the way they work:
Consider this when you set your goals.
Often change agents are needed at many levels of the organization.
Organizational structure and culture are important, and they are often completely
overlooked or are an after thought when data analysis projects are conceived.
PARC | 5
Ohio DJFS:
Who pays their child support
IMS Database
Goals:
Census Data
1.33 TB
6+ Months
Understand the data:
Custom
Code
Raw Data
SQL
Team:
Transformed
Data
2 GB
SQL
Organization:
Ohio DJFS:
Who pays their child support
Percent of Cases
Pay 80% or
more
8%
10%
No
Order
Zero
orders
Pay Less
than 80%
95%
38%
52%
30%
pay 80% of
established
obligation
Pay 80%
or more
Pay Less
than 80%
4.7
3.9
3.8
3.3
1.0
1.6
1.0
2.0
46%
19%
PARC | 7
Ohio DJFS:
Scoring Model for Cuyahoga County
Taking the suggestion from
the last presentation to
consider county level
analysis we built our models
only at the county level.
Largest county in Ohio
by population in 2010
Surrounds Cleveland
62002 Cases
Initiated btw
FY 05 FY 13
57 features
Data known at
case initiation
27 categorical
30 numerical
Link to Dashboard
if k k
then = 1
PARC | 8
Ohio DJFS:
Cuyahoga County Model Performance
Pool of All Cases (N)
High-Value Cases (Top 20% of N)
Prediction
Model
63% Recall
47% Precision
(205% increase in predictive
power over random guessing)
Key Message
The top fifth of the caseload, according to the model,
contains almost 2/3 of the good paying cases.
15% green
85% red
PARC | 9
Thank you
PARC | 12