Sie sind auf Seite 1von 12

Unlocking the potential of your

data

Background
What is Big Data?

Characterized by: Volume, Velocity, Variety,


Cutting edge techniques and technologies often required.
Buzz word or real phenomenon? Both!
Big Data requirements become apparent once you understand the problem youre
solving.

Just Plain Old Data:


Most uses of data do not currently require big data solutions.
Instead focus on what you want to use your data for? What are your business
objectives? What value are you trying to create?
There is a lot of value locked away inside your plain old data.

PARC | 2

The Dream:
Data that leads to Actionable Insights
Data
XIX
IVA
N4A
XIX
XIX
N4A
N4A
IVA
N4A
IVA
IVA
N4A
N4A
N4A
AAO
N4A
AAO
N4A
XIX
XIX
AAO
N4A
N4A
AAO
IVA
XIX
N4A
XIX
N4A
IVA

F4A
C4A
F4A
F4A
F4A
F4A
ONA
ONA
F4A
C4A
C4A
ONA
ONA
F4A
F4A
F4A
F4A
ONA
F4A
MNA
F4A
F4A
F4A
F4A
C4A
F4A
F4A
ONA
F4A
C4A

F4A
C4A
F4A
F4A
F4A
F4A
ONA
ONA
F4A
C4A
C4A
ONA
ONA
F4A
F4A
F4A
F4A
ONA
F4A
MNA
C4A
F4A
F4A
F4A
C4A
F4A
F4A
ONA
F4A
C4A

1
4
2
3
3
2
2
1
1
5
1
2
1
1
3
1
3
1
2
2
2
2
1
1
1
1
1
3
1
1

0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
1 CPR
0 CPR
0 WNG
0 CPR
0 NCP
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
0 CPR
1 CPR
1 CPR
0 CPR
1 CPR
0 CPR
0 CPR
0 CPR
0 WNG

Analysis
20 FEM
17 FEM
31 FEM
20 FEM
34 FEM
29 FEM
20 FEM
19 FEM
18 FEM
18 FEM
20 FEM
21 FEM
30 FEM
16 FEM
31 FEM
16 FEM
17 FEM
36 FEM
32 FEM
15 FEM
19 FEM
18 FEM
22 FEM
16 FEM
23 FEM
20 FEM
14 FEM
20 FEM
21 FEM
21 FEM

Insights that allow


us to take the
actions required to
accomplish
something
extraordinary!

Bill James
Baseball
Player
Data

Small market
teams effectively
compete with
large market
teams.
PARC | 3

The Reality:
Disparate Data, Vague Goals and Organizational Inertia
What do
we want
to do
again?

Get results
of course!

Various
Databases

Disparate Data
Paper
Restricted
Access

LOGS
The
Results
speak for
themselves

Vague Goals

Everyone
will love
them.

Mainframes
Sensors

Organizational Inertia
PARC | 4

The Solution:
Understand your data, goals and organization
Understand your data:

Explore the data in order to gain a rich understanding of whats there.


Transform the data
Never under estimate the effort required here.

Perform the right analysis:

What problem do you want to solve?


Focus on the value to be created.
The right team (yes you need a team)
Use the simplest model to get the job done.

Change the organization is necessary:

Using insights from data often requires people change the way they work:
Consider this when you set your goals.
Often change agents are needed at many levels of the organization.
Organizational structure and culture are important, and they are often completely
overlooked or are an after thought when data analysis projects are conceived.

PARC | 5

Ohio DJFS:
Who pays their child support
IMS Database

Goals:

Census Data

1.33 TB

6+ Months
Understand the data:
Custom
Code

Raw Data
SQL

Team:

Augment with public data


Missing Values?
Etc

Transformed
Data
2 GB

Describe who is and isnt paying their child


support.
Develop predictive scoring models to rank
cases at case initiation.
Build models that are interpretable by case
workers.

SQL

Data Scientist and Researchers


Software Engineers,
Subject Matter Experts - including former
IV-D directors

Organization:

State Run - County Administered


Case workers are knowledge workers
Case workers touch real lives.
PARC | 6

Ohio DJFS:
Who pays their child support
Percent of Cases
Pay 80% or
more

8%
10%

No
Order

Average Percent of Total Obligation Paid over life of


case

Zero
orders

Pay Less
than 80%

95%

38%

52%
30%
pay 80% of
established
obligation

pay < 80% of


established
obligation

Pay 80%
or more

Pay Less
than 80%

Average Age Oldest Child at Case Initiation

4.7

3.9

Average Age Youngest Child at Case Initiation

3.8

3.3

Average Number Children CP Born Out of Wedlock

1.0

1.6

Average Number Children AP Born Out of Wedlock

1.0

2.0

46%

19%

Percent where CP and AP Formerly Married

PARC | 7

Ohio DJFS:
Scoring Model for Cuyahoga County
Taking the suggestion from
the last presentation to
consider county level
analysis we built our models
only at the county level.
Largest county in Ohio
by population in 2010
Surrounds Cleveland
62002 Cases
Initiated btw
FY 05 FY 13
57 features
Data known at
case initiation
27 categorical
30 numerical

53 is the right hand side of this


equation:
53

Link to Dashboard

if k k
then = 1

PARC | 8

Ohio DJFS:
Cuyahoga County Model Performance
Pool of All Cases (N)
High-Value Cases (Top 20% of N)
Prediction
Model

63% Recall

47% Precision
(205% increase in predictive
power over random guessing)

Key Message
The top fifth of the caseload, according to the model,
contains almost 2/3 of the good paying cases.
15% green

85% red

You could theoretically touch almost 2/3 of the


good paying cases by only working 20% of the
= case with an AP that paid 80% of child support obligation over life of case
caseload
= case with an AP that paid < 80% of child support obligation over life of case

PARC | 9

County of Los Angeles, CA:


Whos doing all the printing?
The largest county in the
U.S. County of Los
Angeles, CA.
Employees in 33
departments were spending
millions of dollars printing.
No oversight. No data.
No acquisition strategy.
Through assessments, the
CIOs office discovered
43,000 printers and copiers.
PARC | 10

County of Los Angeles, CA


Managing Printing Across 33 Departments
Department inputs

The fact that we have numbers


has been very effective. Its rare
there are quantitative results
for these kinds of projects.
Rich Sanchez, CIO
County of LA, CA

$9M annual savings


$50M+ total savings
56% printer decrease 44,000 to 18,500
Less administrative
effort & IT support
Cut electrical
consumption 58% =
700 homes annually
PARC | 11

Thank you

PARC | 12

Das könnte Ihnen auch gefallen