You are on page 1of 14

Data Warehouse Architecture: The Great Debate

Inmon Vs. KimballDifferences Between Industry Titans. Almost everyone who has a connection to the concept of a data warehouse has an opinion concerning the best way to construct one for optimum results. The two prominent proponents of data warehouse architecture are Bill Inmon considered to be the father of the data warehouse and !alph Kimball the creator of the data mart. "ince the first time these two industry leaders published their conception of data warehouse architecture there has been anticipation of a live debate of the similarities and differences between their concepts. It had been hoped that having #r. Inmon and #r. Kimball on the same stage whether live or virtual would provide a dialogue and clearly delineate their personal perspectives on data warehouse design. $nfortunately for everyone who has e%pressed an interest in such a thought&provo'ing milestone event it is still not possible to have these two industry leaders s(uare&off in the same forum. #r. Kimball in the past has declined on several occasions to accept the invitation to such a debate and has also declined to be associated with this series of articles. The latest response from #r. Kimball)s office to our re(uest to participate will appear at the end of this series. *owever because both #r. Inmon and #r. Kimball are very prolific in publishing their ideas and offering their opinions I have underta'en the tas' of +staging, a debate between them based on their writings. It will appear as a series over the ne%t five wee's. In preparation for this series Bill Inmon has provided me with a source article ,The -reat Inmon. Kimball Debate that /ever Too' 0lace., In presenting #r. Inmon)s position I have also generally relied on the numerous articles he has published on Business Intelligence /etwor' and for the most part will not be providing specific citations to them. I recommend them to you for reading at your leisure. 1hen directly (uoting #r. Kimball I will provide specific citations to his materials. 2ven though by training and professional bac'ground I am a lawyer this is not intended to be an +empty chair, e%amination. This will be a best effort to present the two viewpoints as impartially and accurately as possible given the constraints of the circumstances. That being said allow me to introduce our participants3

Bill Inmon is recogni4ed as the +father of the data warehouse, and co&creator of the +5orporate Information 6actory., *e has more than 78 years of e%perience in database technology management and data warehouse design. *e has spo'en at seminars worldwide on developing data warehouses. *e has published more than 98: articles and ;8 boo's on the sub<ect. !alph Kimball is 'nown worldwide as an innovator writer educator spea'er and consultant in the field of data warehousing. *e maintains a strong conviction that data warehouses must be designed to be understandable and fast. *e has written more than =:: articles and his boo's on dimensional design techni(ues have been the all&time best sellers in data warehousing. This establishes our players) eminent credentials for this forum. >ur Agenda will be3 ? "ession I3 !alph Kimball)s 5oncept@ ? "ession II3 Bill Inmon)s 5oncept@ ? "ession III3 "imilarities and Differences@ ? "ession IV3 !elational vs. #ultidimensional@ and ? "ession V3 "ummary !eader 6eedbac' and !eader Auestions At any time during the series we will readily accept feedbac' from either #r. Inmon or #r. Kimball and these will be published as their comments clarifications or rebuttals. !eaders) constructive comments and (uestions are also welcomed and will be published and addressed in a special blog.

Data Warehouse: Ralph Kimball's Vision


This is a 5ompilation of 1hat the Author has Derived from !alph KimballBs 0ublished Articles and Boo's as well as from his 1ebsite 1hich 6ormulate the AuthorBs Impression of the Kimball Approach to the Data 1arehouse. #r. KimballBs concept of the data warehouse has evolved over the last C: or so years because of the ever&changing information technology environment. The comple% business communities which need to access and analy4e their data for accurate and profitable decision ma'ing have refined their re(uirements and reconfigured their (ueries as they themselves have evolved. 1hat started out being termed a +data mart, or a collection of +data marts, used the star schemaDsE data models to provide direct access to the stored data by a specified

class of user. The star schema approach has been viewed as a +Bottom $p, approach from those outside the Kimball group as contrasted with the Bill Inmon approach which has been termed +Top Down., The most accurate description regarding the Kimball approach in the author)s opinion comes directly from material from the Kimball website +Design Tip F;G)>ff the Bench),3 +1hen we wrote +The Data 1arehouse Hifecycle Tool'it , we referred to our approach as the Business Dimensional Hifecycle. In retrospect we should have probably <ust called it the Kimball Approach as suggested by our publisher. 1e chose the Business Dimensional Hifecycle label instead because it reinforced our core tenets about successful data warehousing based on our collective e%periences since the mid&=GI:s.

6irst and foremost you need to focus on the business J. Kou must have one eye on the business) re(uirements while the other is focused on broader enterprise integration and consistency issues. The analytic data should be delivered in dimensional models for ease&of&use and (uery performance. 1e recommend that the most atomic data be made available dimensionally so that it can be sliced and diced +any which way., 1hile the data warehouse will constantly evolve each iteration should be considered a pro<ect life cycle consisting of predictable activities with a finite start and end ..., As the above&referenced Design Tip was written e%pressly to refute the +Bottom $p, label for the Kimball approach it went on to e%plain that the Kimball approach recommends developing an +enterprise data warehouse bus matri%., Design Tip ;G continues3 +6inally we believe conformed dimensions Dwhich are logically defined in the bus matri% and then physically enforced through the staging processE are absolutely critical to data consistency and integration. They provide consistent labels business rules.definitions and domains that are re&used as we construct more fact tables to integrate and capture the results from additional business processes.events., The above e%cerpts from the design tip describe the more current Kimball approach which is called the +data warehouse bus architecture., This architecture is comprised of3

A staging area Dwhich can have an 2.! or relationally designed 7/6 design or flat file formatE which cannot be accessed by an end&user of the data warehouse bus. The Data 1arehouse Bus itself which includes several atomic data marts several aggregated data marts and a personal data mart but no single or centrali4ed data warehouse component. The Data 1arehouse Bus3

Is dimensional@ 5ontains transaction and summary data@ Includes data marts which have single sub<ect or fact tables@ and 5an consist of multiple data marts in a single data base. According to the article by !alph Kimball and #argy !oss +Differences of >pinion, in Intelligent Enterprise #arch C::; in the Data 1arehouse Bus Architecture3 +Ja dimensional model contains the same information as a normali4ed L7/6M model but pac'ages it for ease&of&use and (uery performanceJ. It includes both atomic detail and summari4ed informationJ.Aueries descend to progressively lower levels of detail without reprogrammingJ. Dimensional models are built by business processes J not business departments. >nce foundation business processes are available in the warehouse consolidated dimensional models deliver cross&process metrics. The enterprise data warehouse identifies and enforces the relationship between business process metrics DfactsE and descriptive attributes DdimensionsE., A fundamental concept of the Kimball Data 1arehouse Bus design is that in this approach the data warehouse is not a physical repository of the data as in the Inmon approach. It is +virtual., It is a collection of data marts each having a star schema design at its base. >ur ne%t session will describe #r. Inmon)s 5oncept of the data warehouse and set the stage for the 5orporate Information 6actory.

Data Warehouse: Bill Inmon's Vision


In articles written for the Business Intelligence /etwor' Bill Inmon defines his concept of the data warehouse as follows3 +A Data warehouse needs to service the needs of all of its users not <ust one class of user. In an enterprise environment there are many classes of users3 Accounting@ 6inance@ #ar'eting 0roduction@ "ervice@ 2tc. 2ach of these user classes is a separate community with its own way of loo'ing at the data in the data warehouse. This re(uires that the data warehouse have as its basis relationally designed tables for the data. The nice thing about relationally designed tables as a basis for a data warehouse is that in a relational format the relational data can be reshaped and reformed into any configuration that is needed. "tated differently when relational design is done properly and the data e%ists at a low level of granularity in the data warehouse any other configuration of data can be supported N multidimensional cubes star schemas flat files etc. The paradigm for relational data in the data warehouse is that data should be at a low level of granularity and in third normal form D7/6E. After the data is so shaped then it is possible to +lightly denormali4e, the data if it is commonly used in that manner by all classes of users., The Inmon position further holds that once this relational foundation is in place it has the fle%ibility to support multidimensional data marts and other data structures such as e%ploration warehouses data mining data bases etc. Bill Inmon espouses an iterative or spiral approach to the development of a large data warehouse.

+The relational foundation for the data warehouse needs to be built iteratively one table at a time. $nder no circumstances is it optimal to build a data warehouse all at once using the +big bang, approach. Accordingly the methodology that is appropriate to the building of a data warehouse is 'nown as the +spiral approach,J. In the LiterativeM spiral approach one small part of the system isJcompletLedMJ"mall parts of the relational data warehouse are added with each new iteration., In the Inmon model by using the iterative method errors and ad<ustments can be applied to a small amount of data or code without the need to re&program or code large amounts of data in the data warehouse... This relationally designed or 7/6 approach permits a granularity of integrated data which provides ma%imum fle%ibility to the enterprise. If the enterprise has new re(uirements for the data that is warehoused the data in the data warehouse is in a form that is ready to be shaped or formatted to meet the new re(uirements. Bill Inmon has provided an e%cellent description of his concept of data warehouse design3, Data warehouses are arranged LbyM the corporate sub<ect areasJin the corporate data model. $sually the data warehouse is built and owned by centrally coordinated organi4ations. J LItM is a truly corporate L&wideM effort., *e also advises that the data warehouse contains the corporation)s most granular level of data. The structure and content of the data warehouse is not dictated by the re(uirements of any one department but instead is intended to serve the entire corporation)s data re(uirements. The data warehouse therefore re(uires scalable technology to properly house it because of the tremendous volume of data needed for the entire enterprise. The data warehouse also contains historical data from many legacy sources. A critical design tenet of a data warehouse is that it is />T a collection of data marts but is in fact a physically distinct component altogether. The ne%t "ession will focus on specific similarities and differences between the Inmon 5orporate Information 6actory and the Kimball Bus Architecture.

Data Warehouse: Similarities and Differences of Inmon and Kimball


This article attempts to draw out the similarities and differences between the Inmon and Kimball approaches to the data warehouse. >n the sub<ect of what the data warehouse is and what the data marts are both Kimball and Inmon have spo'en3 +J The data warehouse is nothing more than the union of all the data marts J, !alph Kimball Dec. CG =GGO. +Kou can catch all the minnows in the ocean and stac' them together and they still do not ma'e a whale., Bill Inmon Pan. I =GGI. The 5orporate Information 6actory D5I6E and the Kimball Data 1arehouse Bus DB$"E are considered the two main types of data warehousing architecture. Accordingly the two architectures have some elements in common. All enterprises re(uire a means to store analy4e and interpret the data they generate and accumulate in order to implement critical decisions that range from +continuing to e%ist, to ma%imi4ing prosperity. 5orporations must develop operating and feedbac' systems to use the underlying data means Dthe data warehouseE to achieve their goals. Both the 5I6 and B$" architectures satisfy these criteria. Another re(uirement of any data warehouse architecture is that the user can depend on the accuracy and timeliness of the data. The user must also be able to access the data according to his or her particular needs through an easily understandable and straightforward manner of ma'ing (ueries. The data that is e%tracted in this manner by one user should be compatible with and translatable to other operations and users within the same group or enterprise that rely on the same data. Both Inmon and Kimball share the opinion that stand&alone or independent data marts or data warehouses do not satisfy the needs for accurate and timely data and ease of access for users on an enterprise or corporate scale. In an article for the Business Intelligence /etwor' #r. Inmon writes3 +J Independent data marts may wor' well when there are only a few data marts. But over time there are never only a few data marts ... >nce there are J a lot of data marts the independent data mart approach starts to fall apart. There are many reasons why J independent data marts built directly from a legacy.source environment fall apart3

There is no single source of data for analytical processing J@ There is no easy reconcilability of data values J@ There is no foundation to build on for new data marts J An independent data mart is rarely reusable for other purposes@ There are too many interface programs to be built and maintained@ There is a massive redundancy of detailed data in each data mart ... because there is no common place where that detailed data is collected and integrated@ There is no convenient place for historical data@ There is no low level of granularity guaranteed for all data marts to use@ 2ach data mart integrates data from the source systems in a uni(ue way which does not permit reconcilability or integrity of the data across the enterprise@ and The window for e%tracting data from the legacy environment is stretched with each independent data mart re(uiring its own window of time for e%traction J,

In, Differences of >pinion, Dpreviously citedE #r. Kimball gives his opinion of independent data marts3 +6inally stand&alone data marts or warehouses J are problematic. These independent silos are built to satisfy specific needs without regard to other e%isting or planned analytic data. They tend to be departmental in nature often loosely dimensionally structured. Although often perceived as the path of least resistance because no coordination is re(uired the independent approach is unsustainable in the long run. #ultiple uncoordinated e%tracts from the same operational sources are inefficient and wasteful. They generate similar but different variations with inconsistent naming conventions and business rules. The conflicting results cause confusion rewor' and reconciliation. In the end decision&ma'ing based on independent data is often clouded by fear uncertainty and doubt., It appears from the above that both Inmon and Kimball are of the opinion that independent or stand&alone data marts are of marginal use. *owever for the most part this is where the perception of similarity stops. Kou may discern later as I have that there are more similarities but each of our data warehouse architects e%presses them in a very different way.

Inmon believes that Kimball)s star schema&only approach causes infle%ibility and therefore leads to a +brittle, structure. *e writes,J this basic lac' of fle%ibility is at the heart of the wea'ness of the star schema model as the basis of the data warehouse ... 1hen there is an enterprise need for data the star schema is not at all optimal. Ta'en together a series of star schemas and multi&dimensional tables are brittle ... LTheyM cannot change gracefully over time J, #r. Inmon believes his approach which uses the dependent data mart as the source for star schema usage solves the problem of enterprise&wide access to the same data which can change over time. +The relational data warehouse is best served by a relational L7/6M database design running on relational technology J This should be no surprise since the dbms technology the data warehouse runs on wor's the best with a relational database design., The Kimball B$" architecture e%presses that +raw data is transformed into presentable information in the staging area ever mindful of throughput and (uality. "taging begins with coordinated e%tracts from the operational source systems. "ome staging +'itchen, activities are centrali4ed such as maintenance and storage of common reference data while others may be distributed. D+Data 1arehouse Dining 2%perience , Intelligent 2nterprise Pan = C::;.E The above indicates to this author that Kimball has gone beyond the individual star schema approach critici4ed by Inmon and in fact has described his multi&dimensional data warehouse. In this approach the model contains atomic data and the summari4ed data but its construction is based on business measurements which enable disparate business departments to (uery the data from a higher level of detail to the lowest level without reprogramming. Although this description appears to indicate that the Kimball +staging area, is V2!K similar to the Inmon data warehouse the Kimball approach does not recommend a real physically implemented data warehouse. *is +data warehouse, is still the collection of data marts with their conformed dimensions. In #astering Data 1arehouse Design3 !elational and Dimensional Techni(ues by 5laudia Imhoff /icholas -alemmo and Ponathan -eiger D1iley C::7E these authors analy4e the Kimball approach as relying on star schemas for both atomic and aggregated storage. "ummari4ing this point of their research the Data 1arehouse Bus Architecture is said to consist of two types of data marts3 The Atomic Data #arts which hold multi&dimensional data at the lowest level. These can also include aggregated data for improved (uery performance. Aggregated Data #arts. These can store data according to a core business process.

In both the Atomic and Aggregated Data #arts the data is stored in a star schema design.

Their description of the Kimball Bus Architecture seems to indicate that the Kimball Approach still does not recogni4e a need for nor re(uire a central data warehouse repository. The ne%t article will highlight the differences in the two models regarding relational vs. multidimensional data.

Data Warehousin : Relational !s" #ulti$Dimensional Data


5ontinuing with our view of similarities and differences between the Inmon and Kimball designs we turn your attention to a Qmi%edQ view or controversy concerning whether data in the data warehouse should be relationally designed data DInmonE or whether the data should be of a multi&dimensional design and used in a logical collection of data marts DKimballE. 2ach architect addresses the level of granularity of the data re(uired for his design namely the Inmon 5I6 or the Kimball B$". According to Bill Inmon3 QThe paradigm for relational data in the data warehouse should be at a low level of granularity and should be in third normal form... then it is possible to Blightly denormali4eB the data if LitM is commonly used in the denormali4ed form... !elationally designed data warehouse data stored at a low level of granularity can be used in a wide variety of ways... LItM... can be used to support a wide variety of structures of data3 2%ploration warehouses@ Data mining warehouses@ and >HT0 data bases etc.Q The Inmon data warehouse is a physical repository of data which can be used to build data marts in which the data can be in multi&dimensional or other forms. #r. Kimball also considers granularity and atomic level data to be 'ey to his multi& dimensional design data warehouse or B$". In Kimball $niversity Design Tip FC= Declaring the -rainQ it states3 QThe most important step in a dimensional design is declaring the grain of the fact table. Declaring the grain means saying 2RA5THK what a

10

fact table represents... 1hen you ma'e a grain declaration you can have a very precise discussion of which dimensions are possible and which ones are not... Atomic data has the most dimensionality and so it can be constrained and rolled up in every way that is possible for that data source. Atomic data is a perfect match for the dimensional approach... higher levels of aggregation will almost always have smaller dimensions... "ince useful aggregations necessarily shrin' dimensions and remove dimensions it leads to the reali4ation that aggregated data must always be used in con<unction with base atomic data because aggregated data has less dimensional detail.Q At this point in the comparison between Inmon and Kimball it seems they agree on the need for atomic data and the need for it to be available when aggregated data is being used. #r. Kimball is emphatic on this point in defending data marts3 Q"ome authors get confused on this point and after declaring that data marts necessarily consist of aggregated data they critici4e the data marts for Banticipating business (uestions.BQ *e then points out that the misunderstanding can be clarified by providing the atomic data along with the derivative aggregated data. The ne%t and last article will be a summary and will provide reader feedbac' and answers to readersB (uestions received during this series.

11

Data Warehousin : %ur Great Debate Wraps &p


In summary for the Inmon side of the debate3 The Inmon data warehouse design model relies on a relational or 7/6 design foundation with data stored at the atomic level in the data warehouse which is then aggregated and made accessible across the enterprise by e%ploration warehouses data mining warehouses and >HA0 data bases. Ideally it is built using the iterative or spiral development approach. The 5I6 architecture embraces the star schema design for the data marts only />T for the design of the data warehouse. Inmon describes the Kimball approach as QbrittleQ because in his opinion the star schema is closely aligned to end&user re(uirements. Therefore it does not produce a reusable form of data for the enterprise. In the Inmon approach star schemas are only used for dependent data marts. 1ith regard to #r. KimballBs B$" architecture InmonBs comment is3 QAs changes Lby KimballM are made !alphBs architecture becomes ine%orably closer to the 5I6 which has been in the public domain for decades.Q The authorBs summary for the Kimball side of the debate is distilled from QDifferences of >pinionQ Dpreviously citedE. The multi&dimensional data warehouse is not a Qbottom upQ design. *is approach does not re(uire a normali4ed data structure prior to dimensional presentation. Before loading the dimensional tables the Kimball approach states that Qthe data structures re(uired prior to dimensional presentation depend on the source data realities target data model and anticipated transformation.Q 1hile the Kimball approach doesnBt absolutely shun the normali4ation re(uirement of the Inmon approach for the atomic data it does analy4e and challenge whether the enterprise has the need for Qboth the redundant 2TH development and data storage and a clear understanding of the implicit Inmon two&step throughput. The presentation in the Kimball approach is only through data marts. /o physical data warehouse as re(uired by the Inmon approach is re(uired. The Kimball approach is presented as being faster because it argues that the data doesnBt need to go through 2TH multiple times before being accessed by the business. The positions regarding the atomic level of data have been e%plored in the previous article in this series. *owever KimballBs point is QIf you ma'e atomic data available in dimensional structures you can always summari4e the data Qany which way.Q

12

"toring the atomic data in dimensional structures provides business users with the ability to get answers to immediate and sometimes unpredictable problems. According to the Kimball approach this puts usable data in the hands of the business user ma'ing the (uery without re(uiring a data warehouse e%pert to drill into the different normali4ed structures for the data. #r. Kimball also points out that his approach uses the enterprise data warehouse B$" architecture Qwith common conformed dimensions for integration and drill&across support. 5onformed dimensions are the bac'bone of any enterprise approach...Q In rebuttal from #r. Inmon the Inmon approach would accept the premise of Kimball stated above Q If you ma'e the atomic data available in dimensional structures you can always summari4e the data Bany which way BQ B$T only to the e%tent that the atomic data in the dimensional structures is being analy4ed using >/HK multi&dimensional methods. The Inmon approach would contend that statistical mining and even e%ploratory methods cannot be used if the atomic data is made available only in dimensional states. In the Inmon approach the main reason for storing the data in a 7/6 fashion in the data warehouse is not to predispose the data to favor any particular analytical method. As you can see there are many more similarities than differences between the architectures once you get past the semantics. Which is Better' The answer is of course it depends&&on how you cleanse your data@ the level of granularity you choose to access it@ the variety of analytical techni(ues you use to analy4e the data the time and resources you have to build it and your prevailing corporate culture. 1hether you decide to Q0unch InQ to the Inmon 5orporate Information 6actory or Qget onQ the Kimball B$" we hope you have en<oyed this series and we loo' forward to your comments. This concludes the Q-reat Debate.Q The author wishes to than' Bill Inmon for his source material and also to ac'nowledge 5laudia Imhoff Intelligent "olutions@ Dan #eers Knightsbridge 5onsulting@ Poyce #ontanari Independent 5onsultant@ -enia /euschloss -avrosche@ Dere' "trauss

13

-avrosche3 and Bob Terdeman Independent 5onsultant for their assistance and insights on the Inmon approach. 1hile the author relied heavily on #r. KimballBs website articles in Intelligent 2nterprise and on the Design Tips from Kimball $niversity to attempt to present his approach for this series she does not e%pect him to reply. The reply received by Business Intelligence /etwor' from #r. KimballBs office appears below. QAs for the article series we are constantly developing new content for our e%isting writing commitments Q was the Kimball office response. Q1e canBt review and edit.correct everything thatBs written about dimensional modeling and the Kimball #ethods to ensure accuracy. 1eBve decided to pass on a review of your series rather than establishing any precedent for other similar re(uests.Q The author does welcome any rebuttal or clarification to the Kimball approach she has presented from readers who have adopted or employed that approach. Than' you for your interest in QThe -reat Debate.Q

14