Beruflich Dokumente
Kultur Dokumente
Data ware housing is a database that is managed separately from organization's operational database. A data warehouse is a subject oriented,integrated,timing variant and non volatile collection data in support of managements decision making process.
Strategic Information
Who needs Strategic information?
Executive and managers who are responsible for keeping enterprise competitive need information to make proper decision. Needs to establish Goals,set Objectives.
Operational Vs Informational
Data Content is Current value Data Structure is optimized for transaction Access Frequencies is High Access type is read,update,delete Usage is preditable,repititive User is large Data content is achieved,derived and summarized Data Structure are optimized for complex queries Access frequencies is low. Access type is read Usage is Adhoc,random Relatively small
Integrated data
For decision making we need to pull together all the relevant data from various application. Sources are different from different database ,data segments, different formats. In integration we need to remove inconsistencies and standardize various data elements.
Savings account,loan account, checking accounts make bank accounts. Standardization is naming convention,codes,data attributes.
NonVolatile data
Data obtained from various operational and pertinent data from outside source are transformed ,integrated and stored in the data ware house. The OLTP is used for current stock and data ware house is used for snapshots Data from operational setup moved at different frequencies to data warehouse Business transaction does not update the data in datawarehouse,it is done in operational database,once data is moved to datawarehouse it unchanged.
Data Granuality
In operational system data is usually kept at lowest level of detail.
Order in quantity,price at unit level and at the end sum to get toatl sales and purchase of the month. User queries about data analysis in data warehouse he sees for summary data.If wants he may go for further breakdown. In datawarehousing we find efficient to keep summary data at different levels.
TOP-DOWN Approach
ADVANTAGE
An enterprise view of data Single central storage data of content Centralized rules and control May see quick result if implemented with iteration
DISADVANTAGE Takes longer time to build with iteration Highly risk to failure High outlay with the proof of concept
Bottom Up approach
Faster and easier implementation of manageable pieces. Less risk of failure Group of concepts Allows project team to learn and grow. Each data mart has its narrow view of data Premeates redundant data in every data mart Perpetuates inconsistent and irreconcilable data Proliferates unmanageable interfaces.
Production Data
The category of data based on information requirement comes from various enterprises and different operational systems. Operational systems does not have a broad queries and all queries are predictable.We need to run across different platform . We need to challenge standardize and transform the data.
Internal Data
Every organisation have their own Intenal data which could be useful for dataware house. Internal data adds additional complexity to the process of transformation ad integration. We need to do strategic evaluation after taking data from various sources.
Archived Data
In operational systems we periodically take the old data and store it on a archive file . Some opeartional system takes archieve in day basis ,month basis andf some year basis. Since data warehouse keeps historical data for the snapshots of data archive file is necessary.
External Data
External is also equally important for datawarehouses. Since souces within your organisation is not sufficient itself it is necessary for external sources also. We need to transform or standardise the data since data from external don not conform to our formats.
Data extraction
Data extraction is quite complex since data diversity is much more. Data Extraction tools are available ,to extract the data to a separate environment from where moving data to database can be easier. Start Extracting the data from data source when it represents same snapshots of time as other data sources. Do not execute consistency until all the data sources have been stored in temporary data store.
Generate common application code for data extraction Resolve inconsistencies for common data elements from multiple sources
Data Transformation
Data Conversion is an important feature. Since we may move from file based to database. Number of tasks to perform for data conversion are
Data Cleansing Data standardization Data purging for unnecessary data Sorting and merging
Data Transformation
Transform extracted data into appropriate formats of data structures Provide default values as specified. Major features as splitting ,consolidation,standardization and deduplication.
Data Loading
Data loading takes place for
Initial data Incremental data revision as ongoing basis.
Data Loading
Load transformed and consolidated data in the form of load image into datawarehouse repository. Some loaders generate primary keys for the tables being loaded. For load images available on the same RDBMS engine as the data warehouse ,pre coded procedure stored itself may be used for loading.
Informational Component
Information should be delivered for all types of users in data warehousing. They may be novice,casual user,business analyst. The report generated through can be adhoc report,complex queries,MD analysis,Statistical analysis,EIS Feed,data mining.
Metadata Component
Metadata in a data ware house is like data dictionary in DBMS. Data dictionary keeps information about Logical data data structures ,information about file and address.
Types of Metadata
Operational Metadata:Data for data ware house comes from several opeartional systems of enterprise.The data elements selected are from different fields.While delivering we must tie back the original and deliver. Extraction Metadata:Extraction and transformation contain metadat about extraction frequencies ,extraction methods,business rules. End-user metadata:
Helps end user to find information Allows end user to use their own business technology
Why Metadata
Opens the door to the end user and make the content recognizable to user Provide content and structure to user. It connects to all parts of data ware house.
Query Manager
The query manager is the system component that perform all the operations necessary to support the query management process. Its function is as following operations
Direct queries to the appropriate tables Schedule the execution of user queries
The process associated with adding value to the data in the warehouse through summarizing, packaging, and distribution of the data Summarizing the data works by choosing, projecting, joining, and grouping relational data into views that are more convenient and useful to the end users. Summarizing data goes users. beyond simple relational operations to involves sophistacated statistical analysis including identifying trends, clustering, and sampling the data Packeging the data involves converting the detailed or summarized information into more useful formats, such as spreadsheets, test documents, charts, other graphical presentations, private databases, and animation. animation. Distribute the data in appropiate groups to increase its availability and accessibility
Upflow
backing The processes associated with archiving and backing-up of data in the warehouse Archiving the effectiveness and performace maintanance is achieved by transferring the older data of limited value to storage archivers such as magnetic tapes, optical disk or digital storage devices If the databases in a warehouse are very big, partitioning is a useful design option which enables the fragmentation of a table storing enournous number of records into smaller tables. tables. Thus, preserving data warehouse performance The downflow of data includes the processes to ensure that the current state of the data warehouse can be rebuilt following data loss, or software/hardware failures. Archived data should failures. be stored in a way that allows the re-establishement of the data rein the warehouse when required
Downflow
Outflow
Involves the process associated with making the data availabe to the end-users end This involves two activities such as data accessing and delivering Data accessing is concerned with satisfying the end userss requests for the data they need. The main problem here is the need. creation of an environment so that the users can effectively use the query tools to access the most appropiate data source. source. Delivering activity makes possible the information delivery to the users systems/workstations. This activity is referred to as systems/workstations. a type of publish-and-subscribe process. Data warehouse publish-andprocess. publishes several business objects that are revised periodically by monitoring usage patterns. Users subcriber to patterns. the set of business objects that best meets their needs. needs.
Metaflow
Meta Meta-flow is a description of the data contents of the data warehouse, what is in it, where it came from originally, and what has been done to it by way of cleansing, integrating, and summarizing Managing the metadata (data about the data)
Inflow
The processes associated with the extracti on, cleansing, and loading of the data from the source systems into the data warehouse Cleaning include removing inconsistencies, adding missing fields, and cross-checking for data integrity cross Transformation include adding date/time stamp fields, summarizing detailed data, deriving new fields to store calculated data Extract the relevant data from multiple, heterogeneous, and external sources (commercial tools are used) Then mapped and loaded into the warehouse