Beruflich Dokumente
Kultur Dokumente
Asset
Tony Fisher
Tony.Fisher@SAS.com
Intelligent Architecture
Data Quality is
a fundamental component in
the Intelligence Value Chain.
Poor data yields poor decisions.
Technology Stack
IDI Methodology
Data Quality
Linking and Consolidation
Enhancement
Data Quality
Profiling
Table/Column Analysis
Cleansing
Field/Element Analysis
Data Quality
Profiling
Identify Data Defects
Identify Non-standard Data
Table Column Analysis
Frequency Distribution
Min/Max/Outlier Detection
Datatype Analysis
Unique/Null Analysis
Metadata Analysis
Data Profiling
Examples
Invalid input !
Account Number should be Annnnnnnn and unique
Validate against industry or corporate standards !
Business Analyst understands valid Loan Amounts
Metadata !
Might determine through analysis that there are faulty or inefficient
table relationships or that table columns do not contain what the
column type indicates
Numerical Analysis !
Frequency Distribution, Max, Min of Loan Amount
Data Type Analysis !
Date or numeric fields may not be accurately depicted
Data Profiling
Data Profiling
Data Quality
Cleansing
Standardization
Defect Correction
Value in Range
Unique/Missing Compliance
Transformation
Verification
Data Cleansing
Examples
Data Cleansing
Segment for Country Specific Analysis
Data Cleansing
Non-standard Data
Variations
First Merit Bank
First Merit
1st Merit
First Merit Corp
>Fist Merit Corp.
first merit
1st Merit Bank
1st Merit Bank Corp
The 5th 3rd Bank
Fifth Third Bank
5th Third Bank
5th Thrid Bank
Fifth 3rd Bank
Identifying Duplicates
Linking Disparate Sources
Consolidation
Householding
Siting
Duplicate Elimination
Data Enhancement
Demographics
Geographic Data
Spending Habits
Financial Reports