Beruflich Dokumente
Kultur Dokumente
PRESENTED BY:
1. DHANASHRI CHINNAPPA
2. SAGAR PATEL
3. JASON GONSALVES
4. ANIKET PARAB
5. KETKI RAJE
6. GANESH PATIL
2
INDEX
Sr.No: CONTENTS
1. Introduction
2. Data Warehouse Quality Model
3. Tools for Data Warehouse Quality
4. Data Warehouse Querying & Loading
5. Commonly found Data Quality Issues
6. Conclusion
7. Bibliography
3
INTRODUCTION
4
DATA
5
Due to the principal role of Data warehouses (DW) in making strategy
decisions, data warehouse quality is crucial for organizations.
Therefore, we should use methods, models, techniques and tools to
help us in designing and maintaining high quality DWs.
In the last years, there have been several approaches to design DWs
from the conceptual, logical and physical perspectives. However, from
our point of view, none of them provides a set of empirically validated
metrics (objective indicators) to help the designer in accomplishing an
outstanding model that guarantees the quality of the DW.
6
DATA WAREHOUSE QUERYING AND
LOADING:
QUERYING IN DATAWAREHOUSE:
7
Query Model:
The blueprint design aims to anticipate the typical star query shape
and builds indexes over the fact table. The clustered index of the fact
table uses several dimension surrogate key columns (the foreign key
columns) as index keys. The most frequently used columns should occur
in the list of index keys. You may want to take the time to verify that
this indeed provides a good access path for the most frequently
executed queries in your workload.
8
Dimension Tables:
LOADING IN DATAWAREHOUSE:
After the data has been cleansed
and transformed into a structure
consistent with the data warehouse
requirements, data is ready for loading
into the data warehouse. You may make
some final transformation during the loading operation, although
you should complete any transformations that could identify
inconsistencies before the final loading operation.
The initial load of the data warehouse consists of populating the tables
in the data warehouse schema and then verifying that the data is
ready for use. You can use various methods to load the data warehouse
tables, such as:
• Transact-SQL
• DTS
• BCP utility
9
When you load data into the data warehouse, you are populating the
tables that will be used by the presentation applications that make the
data available to users. Loading data often involves the transfer of
large amounts of data from source operational systems, a data
preparation area database, or preparation area tables in the data
warehouse database. Such operations can impose significant processing
loads on the databases involved and should be accomplished during a
period of relatively low system use.
After the data has been loaded into the data warehouse database,
verify the referential integrity between dimension and fact tables to
ensure that all records relate to appropriate records in other tables.
You should verify that every record in a fact table relates to a record
in each dimension table that will be used with that fact table
Extract:
Data auditing tools enhance the accuracy and correctness of the data
at the source. These tools generally compare the data in the source
database to a set of business rules.
10
may be used to discover the business sense of words within the data.
The data that does not adhere to the business rules could then be
modified as necessary.
11
Transfer:
12
hotel’s casino/gaming department as a VIP client – under a similar name
John Smith.
The hotel did not have a data quality process in place to standardize,
clean and merge duplicate records to provide a complete view of the
customer. As a result, the hotel was not able to leverage the true value
of its data in delivering relevant marketing to a high value customer.
1.Proling:
As the first line of defense for your data integration solution, proling
data helps you examine whether your existing data sources meet the
quality standards of your solution. Properly proling your data saves
execution time because you identify issues that require immediate
attention from the start – and avoid the unnecessary processing of
unacceptable data sources. Data proling becomes even more critical
when working with raw data sources that do not have referential
integrity or quality controls.
2. Cleansing:
4. Matching:
5. Enrichment:
6. Monitoring:
CONCLUSION
15
BIBLIOGRAPHY
www.google.com
www.altavista.com
www.scribd.com
www.dogpile.com
www.fickr.com
www.excite.com
16
17