Sie sind auf Seite 1von 13

Data Warehousing

Unit 2

Unit 2
Structure: 2.1 2.2 2.3 2.4

Planning and Requirements

2.5 2.6 2.7

Introduction Objectives Key Issues in Planning a Data Warehouse Planning and Project Management in Data Warehouse Construction Data Warehouse Project Data Warehouse Development Life Cycle Kimball Lifecycle Diagram Requirement Gathering Approaches Summary Terminal Questions Answers

2.1 Introduction
Data Warehouse is a technology, which abstracts and analyzes useful data that helps companies make best business decisions. Data Warehouse is becoming the core technology of Business Intelligence field. Requirements are essential ingredients for developing the Data Warehouse systems. Usually Project Managers or Leads focus much about requirements. This chapter is designed for all IT professionals irrespective of their roles in Data Warehousing projects. It will show you how best you can fit into your specific role in a project. If you want to be part of a team that is passionate about building a successful Data Warehouse, you need the details presented in this unit. Note: Developers have to gather requirements with the view of analysis in mind. Objectives After studying this unit, you should be able to: describe the importance of Project Planning and Requirements Gathering discuss Data Warehouse development strategies and development Life cycle approaches
Sikkim Manipal University Page No. 11

Data Warehousing

Unit 2

highlight the importance of both generalized lifecycle and Kimball lifecycle and its sequences

2.2 Planning Data Warehouses and Key Issues


More than any other factor, improper planning and inadequate project management tend to result in failures. First and foremost, determine if your company really needs a Data Warehouse. Is it really ready for one? You need to develop criteria for assessing the value expected from your Data Warehouse. Your company has to decide on the type of Data Warehouse to be built and where to keep it. You have to ascertain where the data is going to come from and even whether you have all the needed data. You have to establish who will be using the Data Warehouse, how they will use it, and at what times. We will discuss the various issues related to the proper planning of a Data Warehouse. You will learn how a Data Warehouse project differs from the types of projects you were involved in the past. We will study the guidelines for making your Data Warehouse projects a success. Key Issues during Datawarehouse Construction Planning for your Data Warehouse begins with a thorough consideration of the key issues. Answers to the key questions are vital for the proper planning and the successful completion of the project. Therefore, let us consider the pertinent issues, one by one. Values and Expectations. Some companies jump into Data Warehousing without assessing the value to be derived from their proposed Data Warehouse. Of course, first you have to be sure that, given the culture and the current requirements of your company; a Data Warehouse is the most viable solution. After you have established the suitability of this solution, only then can you begin to enumerate the benefits and value propositions. Risk Assessment. Planners generally associate project risks with the cost of the project. If the project fails, how much money will go down the drain? But the assessment of risks is more than calculating the loss from the project costs. What are the risks faced by the company without the
Page No. 12

Sikkim Manipal University

Data Warehousing

Unit 2

benefits derivable from a Data Warehouse? What losses are likely to be incurred? What opportunities are likely to be missed? Differences between OLTP and Data Warehouse projects The Data Warehouse and the OLTP database are both relational databases. However, the objectives of both these databases are different. The OLTP database records transactions in real time and aims to automate the clerical data entry processes of a business entity. Addition, Modification and Deletion of data in the OLTP database is essential and the semantics of the application used in the front end makes an impact on the organization of the data in the database. The Data Warehouse on the other hand does not cater to real time operational requirements of the enterprise. It is more a storehouse of current and historical data and may also contain data extracted from external data sources. However, the Data Warehouse supports OLTP systems by providing a place for the latter to offload data as it accumulates and providing services, which would otherwise degrade the performance of the database table. The primary differences between the Data Warehouse database and OLTP database are given in the table 2.1.
Table 2.1: Data Warehouse VS OLTP Databases Data Warehouse Database Designed for analysis of business measures by categories and attributes Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table Loaded with consistent, valid data and requires no real time validation Supports limited users. Particularly data analyzers (Decision makers) OLTP Database Designed for real time business operations Optimized for a common set of transactions, usually adding or retrieving a small set of rows at a time per table Optimized for validation of incoming data during transactions and uses validation data tables Supports thousands of concurrent users

Sikkim Manipal University

Page No. 13

Data Warehousing

Unit 2

Data Warehouse Implementation Strategy Top - Down and Bottom Up In unit 1, we discussed the top-down and bottom-up approaches for building a data warehouse. The top-down approach is to start at the enterprise- wide data warehouse, although possibly build it iteratively. Then data from the overall, large enterprise-wide data warehouse flows into departmental and subject data marts. On the other hand, the bottom-up approach is to start by building individual data marts, one by one. The integration of these data marts will make up the Enterprise Data Warehouse. We looked at the pros and cons of the two methods. We also discussed a practical approach of going bottom-up, but making sure that the individual data marts are conformed to one another so that they can be viewed as a whole. For this practical approach to be successful, you have to first plan and define requirements at the overall corporate level. Build or Buy. This is a major issue for all organizations. No one builds a Data Warehouse totally from scratch by in-house programming. There is no need to reinvent the wheel every time. A wide and rich range of thirdparty tools and solutions are available. If you want to build the Data Warehouse using in-house development, a lot of coding and maintenance is required. Particularly Meta Data maintenance (DWH schema) becomes difficult. In addition to this, you have to write in-house programs for data extraction, data transformation, programs for loading the Data Warehouse storage. Single Vendor or Best-of-Breed. Vendors come in a variety of categories. There are multiple vendors and products catering to the many functions of the Data Warehouse. So what are the options? How should you decide? Two major options are: 1) Use the products of a single vendor 2) Use products from more than one vendor, selecting appropriate tools Planning your Data Warehouse using Single Vendor approach provides: High level of integration among the tools Constant look and feel
Sikkim Manipal University Page No. 14

Data Warehousing

Unit 2

Seamless cooperation among components Centrally managed information exchange Overall price negotiable (non technical)

This approach will naturally enable your Data Warehouse to be well integrated and function coherently. However, only a few vendors such as IBM, SAS and NCR offer fully integrated solutions. Reviewing this specific option further, here are the major advantages of the best-of breed solution that combines products from multiple vendors. With the best-of-breed approach, compatibility among the tools from different vendors could become a serious problem. If you are taking this route, make sure the selected tools are proven to be compatible. In this case, staying power of individual vendors is crucial. Also, you will have less bargaining power with regard to individual products and may incur higher overall expense. However, the multi-vendor approach is not advisable if your environment is not heavily technical. Business Requirements, Not Technology Let business requirements drive your Data Warehouse, not technology. Although this seems so obvious, you would not believe how many Data Warehouse projects grossly violate this maxim. So many Data Warehouse developers are interested in putting pretty pictures on the users screen and pay little attention to the real requirements. They like to build snappy systems exploiting the depths of technology and demonstrate their prowess in harnessing the power of technology. Note Data warehousing is not about technology, it is about solving users need for strategic information. Do not plan to build the Data Warehouse before understanding the requirements. Start by focusing on what information is needed and not on how to provide the information. Do not emphasize the tools. The basic structure and the architecture to support the user requirements are more important. So before making the overall plan, conduct a preliminary survey of requirements. What types of information must you gather in the preliminary survey? At a minimum, obtain general information on the following from each group of users:
Sikkim Manipal University Page No. 15

Data Warehousing

Unit 2

Mission and functions of each user group Computer systems used by the group Key performance indicators Factors affecting success of the user group Who the customers are and how they are classified Types of data tracked for the customers, individually and groups Products manufactured or sold Categorization of products and services Locations where business is conducted Levels at which profits are measured per customer, per product, per district Levels of cost details and revenue Current queries and reports for strategic information.

As part of the preliminary survey, include a source system audit. Even at this stage, you must have a fairly good idea from where the data is going to be extracted for the Data Warehouse. Review the architecture of the source systems. Find out about the relationships among the data structures. What is the quality of the data? What documentation is available? What are the possible mechanisms for extracting the data from the source systems? Your overall plan must contain information about the source systems. Self Assessment Questions 1. Data Warehouse contains data for ______________ purpose. 2. Data Warehouse is a store house of _______________ data. 3. In most organizations, two groups of people are key to the success of the project, ______________________ and _________________. 4. OLTP systems are designed for __________________. 5. Data Warehouses does not require real-time validation (True / False)

2.3 Planning And Project Management


The Overall Plan The seed for a data warehousing initiative gets sown in many ways. The initiative may get ignited simply because the competition has a Data Warehouse. Different stakeholders may have different opinions for Data Warehouse construction. Coming to the concise decision is very crucial
Sikkim Manipal University Page No. 16

Data Warehousing

Unit 2

here. The Data Warehouse plan discusses the type of Data Warehouse and enumerates the expectations. This is not a detailed project plan. It is an overall plan to lay the foundation, to recognize the need, and to authorize a formal project.

2.4 The Data Warehouse project


As an IT professional, you have worked on application projects before. You know what goes on in these projects and are aware of the methods needed to build the applications from planning through implementation. You have been part of the analysis, the design, the programming, or the testing phases. If you have functioned as a project manager or a team leader, you know how projects are monitored and controlled. A project is a project. If you have seen one IT project, have you not seen them all? The answer in not a simple yes or no; the Data Warehouse projects are different from projects building the transaction processing systems. If you are new to Data Warehousing, your first Data Warehouse project will reveal the major differences. We will discuss these differences and also consider ways to react to them. We will also ask a basic question about the readiness of the IT and user departments to launch a Data Warehouse project. How about the traditional system development life cycles (SDLC) approach? Can we use this approach to Data Warehouse projects as well? If so, what are the development phases in the life cycle? Data Warehouse Development Life Cycle The Data Warehouse development life cycle covers two vital areas. One is warehouse management and the second one is data management. The former deals with defining the project activities and requirements gathering; where as the latter deals with modeling and designing the Warehouse (see fig. 2.2).

Sikkim Manipal University

Page No. 17

Data Warehousing

Unit 2

Life Cycle of Data Warehouse Development


Define the Project Gather Requirements Model the Warehouse Validate the Model

Design the Warehouse

Validate the Design Implementation

Figure 2.1: Life Cycle steps of a DWH (SDLC)

Managing the Project Managing the Data Warehouse project is an on going activity. It is not like traditional systems project. The Data Warehouse is concerned with the execution of warehousing process and the data. Defining the Project The process of defining the project typically involves the following questions: What do I want to analyze? Why do I want? What if I do not do this? How do I get this? Software personnel should get answers to these questions, then we can understand the requirements that must be addressed. Requirements Gathering Transaction Processing Systems focus on automating the process, making it faster and efficient. This, in turn means that the requirements for

Sikkim Manipal University

Page No. 18

Data Warehousing

Unit 2

transactional systems are specific and more directed towards business process automation. In contrast, the Data Warehousing environment focuses on facilitating the analysis that will change the process to make it more effective. Common questions/ information required during requirements. Who is of interest to the user? What is the user trying to analyze? Why does the user need data? When does the data need to be recovered? Where do relevant processes occur? How do we measure the performance? Kimball Lifecycle Diagram

Figure 2.2: Kimball Lifecycle Diagram

Ralph Kimball is known worldwide as an innovator, writer, educator, speaker and consultant in the field of Data Warehousing. The lifecycle strategy of Kimball became industry standard since then. Kimball had proposed Life Cycle approach for the development of Data Warehouse. The Kimball life cycle describes general flow of a DWH implementation, identifies task sequencing and highlights activities that should happen concurrently. In the above diagram (Fig. 2.2) the Dimensional Modeling and ETL will be discussed in the subsequent chapters.

Sikkim Manipal University

Page No. 19

Data Warehousing

Unit 2

Project Planning o Scope, definition and understanding the business requirements o Task Identification o Scheduling o Resource Planning o Workload Assignment o The end document represents a blueprint of the project. Program/Project Management o Enforces the project plan o Status monitoring o Issue tracking o Development of a comprehensive communication addresses both the business and IT units

plan

that

Business Requirements Definition o Success of the project depends on a solid understanding of the business requirements. o Understanding the key factors driving the business is crucial for successful translation of the business requirements into design considerations

What follows the business requirements definition? 3 concurrent tracks focusing on: Technology (Technical Architecture) Data (Dimensional Modeling, Physical Design and ETL) Business Intelligence Applications. Arrows in the diagram indicate the activity workflow along each of the parallel tracks and dependencies between the tasks are illustrated by the vertical alignment of the task boxes. Deployment It is crucial that adequate planning was performed to make sure that the results of technology, data, and BI application tracks are tested and fit together properly. Deployment should be deferred if all the pieces, such as training, documentation, and validated data, are not ready for production release.

Sikkim Manipal University

Page No. 20

Data Warehousing

Unit 2

Maintenance This occurs when the system is in production. It includes technical operational tasks that are necessary to keep the system performing optimally. Some of the technical tasks are listed below: Usage Monitoring Performance Tuning Index Maintenance System Backup Ongoing support, education, and communication with business users

Requirement Gathering Approaches There are two widely used methods for deriving business requirements: Source Driven Requirements Gathering User Driven Requirements Gathering Source Driven Requirements Gathering This process is based on defining the requirements by using the source data in production transactional systems. Analyzing the E-R model of source data does this or the actual physical record layout and selecting data elements deemed to be of interest. User Driven Requirements Gathering This process is based on defining the requirements by conducting interviews and discussions with users about business needs and also investing the functions they perform. It is recommended to follow the user-driven approach to breakdown the project into manageable pieces. Here, each piece is a subject area. The requirements are gathered for each subject area. Note: In the above paragraph, the details about the subject area will be given in subsequent chapters. Self Assessment Questions 6. In most organizations, two groups of people are key to the success of the project, ______________________ and _________________. 7. In Data Warehouse, the requirements are gathered subject area wise. (True / False)
Sikkim Manipal University Page No. 21

Data Warehousing

Unit 2

2.5 Summary
Requirements Gathering is a different strategy for Data Warehouse development. An OLTP system collects data for transaction recording purposes. Where as for a Data Warehouse, data is collected for analysis purpose. Analysis can be sales analysis or mortality analysis or trend analysis, etc. OLTP systems support predefined reports; where as Data Warehouse supports ad-hoc reports. There are two widely used methods for deriving business requirements, Source-driven requirements gathering and User-driven requirements gathering Data Warehouse can be implemented using either top-down or bottomup development methodologies. This decision always depends upon the business requirements. Like Conventional (OLTP projects) projects, Data Warehouses also follow SDLC life cycle approach. Like conventional projects there are certain roles and responsibilities for Data Warehouse development. Roles can be Executive Sponsor, Business Analyst, Testing, and Infrastructure Specialist Coordinator etc.

2.6 Terminal Questions


1. What are Data Warehouse requirements? How do you gather the requirements? 2. Explain the Data Warehouse Kimball life cycle. 3. Differentiate between Data Warehouse requirements approach and OLTP systems approach. 4. Explain any five responsibilities and roles in the development of Data Warehouse. 5. What are the maintenance issues in Data Warehouse?

Sikkim Manipal University

Page No. 22

Data Warehousing

Unit 2

2.7 Answers
Self Assessment Questions 1. Analysis 2. Historical 3. Senior Management and Working Management 4. Real-time business operations 5. True 6. Senior Management, 7. True Terminal Questions 1. Refer section no. 2.4 2. Refer section 2.4 3. Refer table 2.1 4. Refer section 2.5 (Roles) 5. Refer section 2.4 (maintenance)

Sikkim Manipal University

Page No. 23