Sie sind auf Seite 1von 16

Oracle Enterprise Data Quality

Enterprise Data Quality Overview


Sreelal Subhash & Bapirajuswamy Vuda
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
The Data
Quality
Problem

Garbage Garbage
-in -out

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 2


Quality of Data Impacts Everything
Poor
Poor data
data quality
quality is
is the
the “Only
“Only 30%
30% of of BI/DW
BI/DW “More
“More than
than 50
50 percent
percent ofof
primary
primary reason for 40% of
reason for 40% of implementations
implementations fully fully data
data warehouse
warehouse projects
projects will
will
all
all business
business initiatives
initiatives succeed.
succeed. The
The top
top two
two have
have limited
limited acceptance,
acceptance, or or
failing
failing to
to achieve
achieve their
their reasons
reasons for
for failure?
failure? Budget
Budget will be outright failures,
will be outright failures, as as
targeted benefits”
targeted benefits” constraints and data quality.”
constraints and data quality.” a
a result
result of
of a
a lack
lack of
of attention
attention
Gartner
Gartner Gartner
Gartner to data quality issues”
to data quality issues”
Gartner
Gartner

Quality = Fitness for purpose


Complete – Valid – Consistent – Standardized – Accurate

“Through
“Through 2016,
2016, 25%
25% of of organizations
organizations “It
“It does
does not
not really
really matter
matter how
how good
good your
your
using
using consumer
consumer data
data will
will face
face management
management sponsorship
sponsorship or
or your
your business-driven
business-driven
reputation
reputation damage
damage due
due to
to inadequate
inadequate motivation
motivation is.
is. If
If you
you do
do not
not have
have the
the data,
data, or
or the
the data
data
understanding
understanding of information trust
of information trust does not have sufficient quality, any
does not have sufficient quality, any BI BI
issues” Gartner
issues” Gartner implementation
implementation will will fail.”
fail.” Kimball
Kimball

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 3


Examples of Common Data Quality Problems
Variation or Example Variation or Example
Error Error
• Transcription • Hannah - Hamah
Sequence • Mark Douglas or Douglas
errors Mark / phonetic
• Graeme - Graham
errors
Involuntary • Browne – Brown Missing or • George W Smith, George Smith,
corrections extra tokens Smith

Concatenated • Mary Anne, Maryanne Foreign • Khader AL Ghamdi, Khadir A.


names sourced data AlGamdey

Nicknames • Chris – Christine, Unpredictable • John Alan Smith, J A Smith


and aliases Christopher, Tina use of initials
• Full stops, dashes, slashes, Transposed
Noise • Johnson, Jhonson
titles, apostrophes characters
Abbreviations • Wlm/William, Localization • Stanislav Milosovich – Stan Milo
Mfg/Manufacturing

Inaccurate • 12/10/1915, 21/10/1951, 10121951,


Truncations • Credit Suisse First Bost
00001951
dates
Prefix/suffix • MacDonald/McDonald/Donald Transliteration • Gang, Kang, Kwang
errors differences
• Tracy Jones, 125 Mass Ave
Spelling & • P0rter, Beht Exact or Fuzzy • Tracie Jones, 125 Massachusetts
typing errors Duplication Avenue

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 4


Data Quality Methodology: 4 Cornerstones
• Data Structure
and Content • Reformat
• Accuracy, • Standardise
Consistency, • Clean
Completeness, • Enhance
Timeliness, Understand Improve • Match and
Uniqueness, merge
Validity
• Fitness for
purpose

• Ensure good
• Monitor, Govern Protect data quality at
measure and source.
continuously • Prevent
improve data endemic
quality problems
• Manage issue developing
remediation

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5


Enterprise Data Quality Features (1)
• Full range of data quality functionality in single configuration
user interface:
– Profiling, auditing, transforming, parsing, matching, dashboard,
issue management.
– Read, write and report on data.
– Batch and real-time.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6


Enterprise Data Quality Features (2)

• Collaborative environment:
– multi-user.

Simple, drag and drop user interface.
— Designed for data owners as well as data administrators.
– multi-project and multi-server.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 7


Collaboration Across User Communities

Executives & Stakeholders

Business Analysts
Data Analysts

Director Users

Director
Director Data Stewards
Reviewers Director
Executives

Involve representatives of these groups


to get business context and consensus

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 8


Why Enterprise Data Quality?

Main advantages:
• Ease of use:
– Powerful, but simple and intuitive.

Designed with business analysts in mind.
• Quick installation and time to productivity.
• Enjoyable to use.
• Extensible and open.
• Friendly to Service Oriented Architecture.
• All Data Quality functionality available:
– Via a single configuration user interface.
– To any process.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 9


EDQ has Two Main User Interfaces

• Director for configuration.


• The Server Console for Operations.

Configuration Operations
Director The Server
Console

Configure Run and


and test EDQ monitor jobs,
processes Server(s) view Event
and jobs. Log.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 10


User Interface Languages

The Enterprise Data Quality user interfaces are available in the


following languages:
• English
• Spanish
• Brazilian Portuguese
• French
• German
• Italian
• Chinese
• Japanese
• Korean

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 11


Enterprise Data Quality Architecture

• Rich Java Web Start user interface:


Client • Configure, initiate, monitor processes.
• Browse Results

• Deployed on Java application server:


Business Layer • Captures and processes data.
• Updates UI with results.

• Repository consists of two database


schemas:
Data Storage • Director – persistent configuration data.
• Results – temporary data refreshed when
processes are run.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 12


Enterprise Data Quality Deployment Options

• Requires Java running on


Client • Windows.

• Must be deployed in a webserver:


Business Layer • Weblogic or Tomcat.

• Requires:
Data Storage • PostgresSQL or Oracle database.

See Oracle Fusion Middleware Certification Matrix.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 13


Enterprise Data Quality Deployment Options

Client Client(s) Client(s)

Business Business Business


Layer Layer Layer

Data Data Data


Storage Storage Storage

All three layers can be installed on a single machine (e.g. a laptop), or


each layer can be installed on a separate machine.
Many clients can connect to the same 'server'.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 14


High Availability and Scalability: Architecture
EDQ User Interfaces

WebLogic
HTTP Server Console

WebLogic Domain
Managed Managed Managed
Server 1 Server 2 Server 3 WebLogic
Administration
WebLogic
Server
Cluster

EDQ
Oracle e
. h om
me
Configuration
RAC EDQ q
oed cal.ho
q.lo Folder
Database Repository oed

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 15


High Availability (HA) and Scalability

• Requires the Oracle Database and Oracle WebLogic.


• Easy to configure during EDQ installation.
– Deploy many managed servers in a WebLogic cluster.
• All managed servers use the same database repository and
the same pair of configuration folders (oedq.home and
oedq.local.home).
• Real-time jobs run on all managed servers automatically.
• Batch jobs are load-balanced within the cluster.
• Automatic failover if managed servers go down.
• Database failure toleration.
• Enables high availability and scalability of Case Management.
• For more information see the Oracle Fusion Middleware
Understanding Oracle Enterprise Data Quality guide.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 16

Das könnte Ihnen auch gefallen