Beruflich Dokumente
Kultur Dokumente
Matthew Lawler
Independent ETL Data is aggregated from sources Data quality is uneven because There is some efficiency already
by each division separately, using different testing is performed. as the divisions have discovered
different approaches and tools. There are also missed ways to solve this problem
opportunities to share independently, but it could be
experiences and costs. improved. Reporting staff are
more busy data cleansing rather
than data reporting.
Integration is fundamentally There is little common ground Missed relationships between Cannot be scaled as rules are
difficult between Environmental, Environmental, Economic and manual and defined in multiple
Economic and Sociological data. Sociological goals, so that the places.
The best integration data type is Agency has difficulty in achieving
geospatial data, which is the triple bottom line.
complex. Integration is done per
study.
No common Platform Each developer uses MS Access Rapid obsolescence means data Unable to scale due to 2Gig limit.
as default. becomes inaccessible.
ACT PLAN
Data Farmer
MONITOR DEPLOY
Purpose Assess Issues, evaluate Implement the Data Quality Active monitoring against Use tools and approaches to
alternatives and determine Plan to establish a business as defined thresholds, so that fix data quality issues, once
cost and impact of change ; usual capability to achieve data quality meets the thresholds have been
Plan to take on new data acceptable data quality business requirements exceeded.
collections;
Initiate and review of
upgrade of current data
collections;
Outcomes Defined data quality Agreed DQ improvement Known data quality levels More data quality measures
requirements, infrastructure, plan; Rollout/Training to measured against thresholds; remain within agreed defined
standards and thresholds ; Divisional staff ; Escalation process to Data business rules leading to
Data quality assessment ; Embed Data quality into Stewards increased data quality.
Data Quality Improvement operations
Plan ;
Reiterate PDMA cycle
Key Steps Define data quality Execute the plan; Measure data quality levels Use agreed methods and
requirements, infrastructure, Publish standards ; against defined business tools to fix data quality
standards and thresholds ; Implement infrastructure ; rules; issues;
Assess current data quality ; Train staff; Escalate to Data Stewards Do analysis and cleansing;
Create data quality Incorporate data quality into when needed; Negotiate with stakeholders
improvement plan ; normal operations Trend analysis on key metrics regarding quality levels;
Set up data quality PDMA Enforce the use of approved
process tools;
Review agreements and IT as
needed.
Data Farmer A data farmer is an information harvester. They are like a mechanic.
Data Quality Quality is the degree to which a set of inherent characteristics fulfils requirements (ISO 9000)
Data Rot The tendency of Data to undergo decomposition or decay, or to fall into chaos or disorder.
Data Tourist A data tourist is an information browser. They are like a vehicle driver.
Any transformation to the data to make the data fit defined requirements, separate from any intrinsic or
Enhancement extrinsic checking.
Data Steward A Data Steward manages data assets on behalf of others and in the best interests of the organization.
Extrinsic checking Any filter applied to the data using an extrinsic predicate.
Extrinsic data quality This is a data characteristic that can be evaluated by looking at the data alongside some external
characteristic standard. That is, it is the combination of an external definition and the data itself.
Intrinsic checking Any filter applied to the data using an intrinsic predicate.
Intrinsic data quality This is a data characteristic that can be evaluated by looking at the data itself. That is, it is internal and
characteristic implied by the data (type) definition.
Tourist
Lead Lead
Alluvium Alluvium
Blurry Blurry
Clear Clear
Bronze Bronze
Silver Silver
Gold Gold
11 Nov 2019 Data Quality Framework - Matthew Lawler 22
DQ Flow: Maturity Conditional
Are the Requirements
for the data Yes
documented?
Yes
Extra
Argume
Dimension nt Description - General Example
Use available record information and Use address data to derive the
Build new Keys reference data to create new identifiers Lat/Long.
Data Vault Hubs and Hub and Link data structures are needed to
Links enable joins across different source schemas. See Data Vault standard.
Extra
Argume
Dimension nt Description - General Example
Satellite data structures are
needed to easily compare
dependent data in the data vault
Data Vault Satellites model. See Data Vault standard.
Extra
Argu
Dimension ment Description - General Example