You are on page 1of 37

Implementing Data Governance

at Grifols: Best Practices and


Lessons Learned
Praneeth Padmanabhuni, Grifols Inc.
Richard Hauser, Decision First
Technologies
SESSION CODE: 0204

LEARNING POINTS
Discuss how SAP Information Steward can assist in
establishing a Data Governance program
Enable power users in the business to own data
processing and be responsible for data quality
Remove manual steps to automate data processing as
much as possible
Extend the out-of-the-box visualizations available in
Information Steward scorecards by utilizing repository
metadata
Involve data stewards directly in de-duplication efforts
via Match Review Tasks in Information Steward

Who Is Grifols?
International healthcare company based in
Barcelona, Spain with offices the Raleigh, NC as
well as Los Angeles
Develop and distribute life-saving protein therapies
derived from human plasma
Have experienced rapid growth over the past few
years as a result of mergers and acquisitions

Challenges as a Growing Company


70+ files to collect data from on a monthly basis
70+ varying degrees of data quality!!!
Only want to count sales at the closest point to an actual
consumer

Data warehouse had previously been outsourced,


but volumes had reached a point where insourcing
became a more attractive option
Data cleansing was being performed manually via
Excel files, but using a tool to process large
volumes became a necessity

Decision First Technologies


Who we are
Atlanta-based SAP Business Objects specialists
Partnered with SAP
7x Business Objects Partner of the Year
SAP Business Objects, SAP EIM, and SAP HANA
experts

What we do
Strategize and implement Data Governance solutions
BI Nirvana 90 day Business Intelligence on HANA
Full lifecycle data warehouse implementations
Data visualizations and standard reporting

Data Governance Defined


Core business process that ensures data is treated
as a corporate asset and is formally managed
throughout the enterprise
Marriage of the following programs:
Data Quality
Information Management policies
Security
Business process management
Risk management

Information Steward
Information Steward was chosen to be used as the
tool to help implement initial DG policies
Integrates nicely with DS, which was already in use

Gives visibility to data quality issues


Easy for business users to pick-up and run with
Not a fully blown master data solution, more of an
MDM-lite

Challenges at time of enlisting DFT


Cluttered ETL environment
Many manual steps needed for weekly processes
Data issues popping up weeks after loading of flat
files
Users not trustworthy of account master data

Solutions Put Forward


Implement best practices in ETL environment
Multiple developer repositories, central repositories, and
best practices naming conventions

Combine and automate common ETL jobs to the


fullest extent possible
Give visibility to data quality by developing an
Information Steward scorecard
Improve the customer account matching process
and utilize DS cleansing transforms to build user
trust in the data warehouse

ETL Coding Best Practices


Multiple repos and landscapes
Previously just PRD
One repository per developer
Fully fleshed-out DEV, QA, and PRD to properly test

Central repo for each environment


Allows for versioning and rollback in event of unintended
consequences
Moving objects to central forces developers to fully
understand the impacts they are having to all objects

Naming standards
Objects properly named, data that is being sourced from
or written to, initial/delta load, number in sequence if
applicable
E.g. DF_ACCOUNT_MASTER_INT_D

ETL Automation
Combine objects into jobs, workflows, etc
Went from 15 steps down to 3-4 depending on data

Code objects for reusability, not one-off executions


Standardize variables across all jobs and conform
to a template job format
Job Execution Table, Job Start Script
Give power users authority to process data when
ready by allowing them to run certain DS jobs that
they are responsible for

DQ Visualizations
Needed a way to assess DQ before it became an
issue
IS Data Insight was the best solution for our purposes

Same data validation rules could be applied to all


distributors
Limit the data being analyzed to only most recent month

Built an event-based process chain in the CMC to


seamlessly integrate this step into the normal
weekly ETL jobs

Original Sales Staging Process

New Sales Staging process with DQ

DQ Reporting Enhancements
Extract data from appropriate tables/views in the IS
repository database every time new DQ data is
available
Historical scores are readily available from the
following database views:
MMB_DATA_GROUP
Contains project names, among many other things

MMB_KEY_DATA_DOMAIN
Key Data Domain descriptions

MMB_KEY_DATA_DOMAIN_SCORE
Historical scores for every active quality object

MMB_DOMAIN_VALUE
Quality dimension descriptions

MMB_KEY_DATA_DOMAIN_SCORE
Contains scores for KDDs, QDs, Rules, Bindings,
by key data domain, which is attached to a
scorecard
Column to select score type is
KEY_DATA_DOMAIN_SCORE_TYPE_CD
TOTL = Key Data Domain Score
KDDQ = Quality Dimension Score
KDDR = Rule Score
KDDB = Rule Binding Score

Information Steward Repo Joins


MMB_DATA_GROUP.DATA_GROUP_ID =
MMB_KEY_DATA_DOMAIN.PROJECT_ID (Project
description)
MMB_KEY_DATA_DOMAIN.KEY_DATA_DOMAIN_ID
=
MMB_KEY_DATA_DOMAIN_SCORE.KEY_DATA_DO
MAIN_ID where score_type_cd = TOTL (for KDD
scores)
MMB_KEY_DATA_DOMAIN_SCORE.SCORE_ID =
MMB_DOMAIN_VALUE.DOMAIN_VALUE_ID where
score_type_cd = KDDQ( for Quality Dimension
scores)

Automated DQ Chain

Automated DQ Chain

Scorecard

Scorecard Drilldown

DQ Webi Report

DQ Webi Report Drilldown

Account Master Cleanup Requirements


Needed to prove to the business that account
master data was trustworthy
Too many overmatch and undermatch scenarios existed
in the old account master

Could not start from scratch because internal data


had been matched to an external data source by a
third party
Needed the cleanup effort to have data steward
input for uncertain matches
Little impact as possible on all current processes

Account Master Cleanup, Step 1


Identify overmatch scenarios, i.e. accounts that
had been incorrectly matched together
Run all current accounts with their children
through a data quality match transform
Break key is on Data Warehouse ID
Child can only match to their parent, not to other parent
accounts

Pass all potential overmatches to a review task in


Information Steward for data steward input
Use data stewards input to determine how to
handle the record
Leave alone or create a new account master

Account Master Overmatch Cleanup

Account Master Cleanup, Step 2


Improve the current delta matching logic that was
part of the sales weekly data warehouse load
Should see a gradual decrease in number of new
accounts created over time
3K per week initially

New children accounts must be matched first


against existing account masters, only after that
can they be considered a match with each other
Account master data was frozen for one month
to accomplish this task
Short enough timeline to not have a critical impact on
business decisions

Account Delta Process

Account Master Cleanup, Step 3


Identify undermatched accounts
Accounts that should be merged together but havent
been for whatever reason

Run all existing account master records through a


DS match dataflow to determine if they should be
merged into one
If a potential match is found between 2 or more
accounts, pass this match group along to an IS
Match Review task for data steward review
Utilize data stewardship results to determine a
winning account master and deprecate the
others in the group

Account Master Undermatch Cleanup

VISION FOR THE FUTURE


Ultimately would like to associate Salesforce.com
CRM data with actual sales data coming from
distributors
Provides backward-looking analysis of sales rep
performance
Capability to start performing some predictive analysis
Find more ideal customers
Identify prototypical customers
Focus on these accounts to grow business

Foundation is now in place to be in compliance


with Sunshine Act when it goes into effect

RETURN ON INVESTMENT THUS FAR


Yearly savings resulting from initial DW project:
$441.5K
Savings resulting from reduced time to process weekly
records: $13,000/month or $156,000/year
Customer targeting and predictive analytics is next
No upper bound on revenue potential

BEST PRACTICES
Involve the business often to showcase improvements
and ask for further suggestions
Necessary for all DG/DQ projects

Keep history of IS Match Review results


OK to leave in same table in 4.1, issues have been found in
early versions of 4.2
Just fine to move to another table if too confusing

Have separate Reviewer and Approver roles for


Match Review tasks
Easy to get fatigued when going through hundreds or
thousands of records
Also a good idea to allow a few days to pass between review
and approval

KEY LEARNING
SAP Information Steward can assist in establishing a
Data Governance program and gaining momentum
within your organization
Empower your power users to own data processing
and be responsible for data quality. Actively involve
business users in all steps of the process
Eliminate manual intervention to automate data
processing as much as possible. This is where a large
portion of ROI can be found

Questions?

Praneeth Padmanabhuni
praneeth.padmanabhuni@grifols.com
Rich Hauser
richard.hauser@decisionfirst.com

FOLLOW US

Follow the ASUGNews team:


Tom Wailgum: @twailgum
&
Courtney Bjorlin: @cbjorlin
For all things SAP

THANK YOU FOR PARTICIPATING


Please provide feedback on this session by completing a
short survey via the event mobile application.
SESSION CODE: 0204
For ongoing education on this area of focus,
visit www.ASUG.com