Beruflich Dokumente
Kultur Dokumente
Data profiling is divided into three steps. Single Column Profiling to get column domain
information. Table Structural Profiling to get dependency information, and Cross Table
Profiling to get relationship information.
What is a Domain?
A simple example of a Domain is the list of United States state abbreviations. The
Domain could be implemented as a CHAR(2) and would contain the following valid
value set: AL, AK, AR, CA, CO, CT, DE, DC, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA,
ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NV, NC, ND, OH, OK,
OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY. 2
Many Columns can share the same Domain. Columns which share the same Domain may
be Synonym candidates.
A Domain is defined as the set of all valid values for a Column or set of Columns.
Domains contain target data type information, a user-defined list of valid values, and a
list of valid patterns. Each Schema has its own set of Domains.
2. Informatica Data Explorer 8.6.2 User Guide
PAN
Sunil
Sunil
Rohit
Rohit
FirstName
A
J
B
J
MI
Singh
Sharma
Kumar
Kumar
LastName
If you ran a dependency profile for this Table, you would find the following dependencies, among others:
EmployeeID PAN
EmployeeID FirstName
EmployeeID LastName
EmployeeID MI
PAN EmployeeID
PAN FirstName
PAN LastName
PAN MI
LastName FirstName
FirstName + MI LastName
Synonyms
Two or more Columns that have the same business meaning are called Synonyms. Suppose a Schema
contains the following two Tables:
EmployeeID
Name
001
Bob Smith 002
002
Jane Goodall
003
Royal Robbins
EmpID Salary
001
25,000
002
50,000
003
25,000
MgrID
01-15-00
005
002
Hiredate
06-12-99
09-08-99
Stock Options
100
500
100
Both relations contain employee data, but they are defined separately to segregate public and private
information. The EmpID and EmployeeID Columns have the same business meaning and can be
meaningfully combined into a single Column. In contrast, look at how the MgrID column is used in
the Employee Table. Even though MgrID uses similar values to EmployeeID, it represents a different
role in the database. Therefore you would not define MgrID and EmployeeID as Synonyms.
Synonyms
Continued.
Normalization has the following impacts on Synonyms:
If two or more Columns made Synonyms represent the identical construct, they will collapse into
one Column in the normalized model.
If two Columns made Synonyms represent a parent-child relationship, they will result in two
Columns in two Tables, with one Column participating in a primary Key and the other in the
corresponding Foreign Key. 5
5. Informatica Data Explorer 8.6.2 User Guide
2)
3)
Oracle Warehouse Builder 10g (Data Profiling node in the Project Explorer)
4)
5)
QnA