0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
94 Ansichten38 Seiten
This document discusses dimensional modeling and star schemas. It covers Ralph Kimball's philosophy of dimensional modeling, which advocates for building a data warehouse through multiple subject-area data marts connected by conformed dimensions. The document emphasizes that conformed dimensions and standardized fact definitions are essential to tying together dimensional models from different data marts into a cohesive data warehouse architecture.
This document discusses dimensional modeling and star schemas. It covers Ralph Kimball's philosophy of dimensional modeling, which advocates for building a data warehouse through multiple subject-area data marts connected by conformed dimensions. The document emphasizes that conformed dimensions and standardized fact definitions are essential to tying together dimensional models from different data marts into a cohesive data warehouse architecture.
This document discusses dimensional modeling and star schemas. It covers Ralph Kimball's philosophy of dimensional modeling, which advocates for building a data warehouse through multiple subject-area data marts connected by conformed dimensions. The document emphasizes that conformed dimensions and standardized fact definitions are essential to tying together dimensional models from different data marts into a cohesive data warehouse architecture.
1 Dimensional Modeling and Star Schemas First Course on Dimensional Modeling The logical and physical designs are the cornerstone of the Data Warehouse While our prior textbook focused on Inmons theories, in this section, we will utilize 2 theories, in this section, we will utilize Kimballs philosophy for dimensional modeling Beyond Inmons theories of single star schema data marts, we will explore Kimballs Data Warehouse Bus Architecture First Course on Dimensional Modeling Inmon vs. Kimball Bill Inmon is sometimes called the father of data warehousing. First to define and champion the concept of a data warehouse Credited with the term data warehouse. Ralph Kimball is sometimes called the father of business intelligence. 3 intelligence. Codified the star schema and snowflake data structures Defined many Business Intelligence concepts, such as: Data marts Dimensional hierarchies Base and aggregate metrics Drilling In short, Kimball developed the science behind modern analytical reporting tools Both men have made immeasurable contributions to the field First Course on Dimensional Modeling The Case for Dimensional Modeling What is Entity Relationship Modeling? Weve already covered ER Modeling Some traditional ERDs (for example, SAP) have 1000s of entities that are not easily queryable 4 queryable This is a show stopper for BI End users cannot remember the model Software cannot easily query an ERD. Optimizers may make wrong choices Traditional ERDs - not Intuitive and High Performance First Course on Dimensional Modeling The Case for Dimensional Modeling What is Dimensional Modeling? Design technique that presents data in a format that is intuitive and allows high performance access Has one table with a multipart key called the fact table Has a set of smaller tables called dimension tables 5 Has a set of smaller tables called dimension tables Each dimension table has a single part primary key that corresponds to one of the components of the multipart key in the fact table First Course on Dimensional Modeling The Case for Dimensional Modeling What is Dimensional Modeling? (continued) Fact table has a multipart key made up of the FKs from the dimensions. Represents M-M Most useful fact tables contain facts that are numeric and additive 6 and additive Fact additivity is crucial Dimension tables usually contain descriptive textual information. These are the most interesting constraints and are usually the row headers in a SQL result set. First Course on Dimensional Modeling The Case for Dimensional Modeling Relationship between Dimensional Modeling and ER Modeling First step in converting from ER model is to separate the model into its separate business processes Second step is to select the M-M relations containing 7 Second step is to select the M-M relations containing numeric and additive non-key facts and designate them as fact tables Third step is to denormalize the remaining tables into flat tables with a single part key These become the dimension tables First Course on Dimensional Modeling The Case for Dimensional Modeling Relationship between Dimensional Modeling and ER Modeling (continued) Resulting data warehouse model will contain 10 to 25 star schemas, each with 5 to15 conformed dimensions 8 conformed dimensions Many dimension tables will be shared Applications that drill down will use multiple dimensions from a single star join Applications that drill across will link separate fact tables through the conformed dimensions Each fact table can be queried independently First Course on Dimensional Modeling The Case for Dimensional Modeling The strengths of Dimensional Modeling 1. Predictable, standard framework Query tools and user interfaces can make assumptions that make user interfaces more understandable and processing more efficient Allows browsing across attributes within a dimension using 9 Allows browsing across attributes within a dimension using bit vector indexes (bit maps) 2. Every dimension is equivalent Equal entry points into the fact table Withstands unexpected changes in user behavior Symmetrical user interfaces, query strategies, and SQL 3. Gracefully extensible to accommodate unexpected new data elements and design changes First Course on Dimensional Modeling The Case for Dimensional Modeling The strengths of Dimensional Modeling (continued) 4. There are a number of standard approaches for handling common modeling situations, such as (discussed later in the chapter): Slowly changing dimensions Heterogeneous products 10 Heterogeneous products Transaction based businesses Event handling scenarios (i.e. factless) 5. Growing number of administrative utilities and software processes that manage and use aggregates (discussed in a later chapter) Aggregates are summary records that are logically redundant; used to enhance performance The dimensional model is the only viable technique for achieving both user comprehension and query performance First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture The debate: Do we build a central Data Warehouse or separate subject areas (Kimball vs. Inmon) Kimball states that there are some Data Warehouse myths. These are open to debate: 11 myths. These are open to debate: Nobody believes in a totally monolithic approach All Data Warehouse practitioners use a step by step approach We have (or, at least Kimball has) moved beyond the phase in Data Warehouse development where a data mart must be restricted to being an aggregated subset of a non-queryable Data Warehouse First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture The planning crisis Two unrelated challenges. The DW manager is supposed to understand: all of the content and location of all data in the enterprise What keeps management awake at night (provide answers to 12 What keeps management awake at night (provide answers to all of the high level questions that executives want answered) Kimball says that the data mart is the solution to this dilemma, built one at a time However, isolated stovepipe data marts that cannot be tied together are the bane of DW movement First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Data Marts with a Bus Architecture Plan a series of steps with finite and specific goals Separate data marts Each implementation closely adheres to the architecture 2 steps 13 2 steps Create a surrounding architecture that defines the scope and implementation of the complete DW Oversee the construction of each piece The biggest task in construction is designing the extract system to get data, transform it, and load it into the final database that allow querying First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Conformed Dimensions and Standard Fact Definitions Before implementation, produce the suite of conformed dimensions and standardize the definition of facts 14 of facts This set of standards is called Kimballs DW Bus Architecture Every fact table is surrounded by conformed dimensions in a star join A conformed dimension means the same thing to every possible fact table A major responsibility of the DW team is to establish, publish, maintain, and enforce conformed dimensions First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Conformed Dimensions and Standard Fact Definitions (continued) Without conformed dimensions, data marts cannot be used together and may produce wrong results Conformed dimensions make possible: 15 Conformed dimensions make possible: Single dimension can be used with multiple fact tables User interfaces and data content are consistent when used with that dimension Consistent interpretation of attributes across data marts First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Designing the Conformed Dimensions Conformed dimensions will naturally be defined at the most granular level possible The grain of the time dimension will be days Conformed dimensions always have an anonymous 16 Conformed dimensions always have an anonymous (surrogate) key that is not the production key from one of the legacy systems Taking the Pledge The data mart teams must use the conformed dimensions Creation of conformed dimensions is as much a political decision as it is a technical decision Ch05 First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Establishing the Conformed Fact Definitions The upfront data architecture effort will be about 20% on conformed fact definitions and about 80% on conformed dimensions Usually done at the same time Facts must also be conformed 17 Facts must also be conformed Must be the same if they are called the same thing Sometimes, a fact has one natural unit of measure in one fact table and another natural unit of measure in another fact table This can cause problems for drill across reports The correct solution is to carry the fact in both units of measure in both tables (duplicate the fact) If it is difficult or impossible to exactly conform a fact, then give each interpretation a different name. Ch05 First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture The importance of Data Mart granularity Base level fact tables in each data mart should be at the natural lowest levels of all the constituent dimensions Granular fact tables can be gracefully extended by adding new facts, new dimension attributes, or even whole new dimensions Gracefully extended means that old queries and applications 18 Gracefully extended means that old queries and applications continue to run Multiple Source Data Marts Kimball recommends Start with a single source data mart Most risk comes from too big of an extract programming job After several single source marts, then combine them This will satisfy users and allow the team to work on harder issues Ch05 First Course on Dimensional Modeling Putting Dimensional Models Together: DW Bus Architecture Rescuing Stovepipes If you have a pre-existing efforts: If the dimensions were proper conformed dimensions, then they can become part of the overall architecture If not, then shut it them down and rebuild When you dont need conformed dimensions 19 When you dont need conformed dimensions When there are several separately managed lines of business A data mart is a complete subset of the overall data warehouse Every data mart is a family of similar tables sharing conformed dimensions The Data Warehouse Bus Conformed dimensions and conformed facts are the bus of the DW First Course on Dimensional Modeling Basic Data Modeling Techniques Fact Tables and Dimension Tables The fundamental idea is that every type of business data can be represented as a type of cube of data The cells of the cube contain measured values The edges of the cube define the natural dimensions 20 The edges of the cube define the natural dimensions The call this a hypercube, or alternately, cube or data cube Usually contain between 4 and 15 dimensions Kimball says that models with 20 or more dimensions seem unjustified First Course on Dimensional Modeling Basic Data Modeling Techniques Envisioning a cube We saw this in a previous slide 21 Chicago M a r k e t s D i m e n s i o n Atlanta Sales Fact First Course on Dimensional Modeling Basic Data Modeling Techniques Q4 Cherries Grapes Melons Q1 Q2 Q3 Time Dimension Dallas Denver Chicago M a r k e t s Apples Envisioning a value in each dimensions of a cube SalesRep Product Fact table representing Daily Sales counts by SalesRep 6 11 2 12 First Course on Dimensional Modeling Basic Data Modeling Techniques Envisioning a value in each dimension of a cube 23 Date 3 5 17 11 1 21 14 22 13 6 13 31 First Course on Dimensional Modeling Basic Data Modeling Techniques Envisioning a value in each dimension of a cube with summaries First Course on Dimensional Modeling Basic Data Modeling Techniques Fact Tables and Dimension Tables (continued) Facts An observation in the marketplace Most are numeric The designer should suspect that any numeric data is 25 The designer should suspect that any numeric data is probably a fact Attribute Usually text fields that describe a characteristic of a tangible thing Dimension Textual attributes that describe things are organized within the dimensions First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down Drilling down is the most venerable (respected) kind of drilling in a Data Warehouse Drilling down means giving more detail i.e. adding a row header to a report 26 i.e. adding a row header to a report Dimension attributes Are textual The source of application constraints Conversely, removing a row header is drilling up Not necessarily in the same order that they were added First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down (continued) DW industry has been using the term browsing for since the beginning of the 1980s Means interactively examining the relationships among 27 Means interactively examining the relationships among attributes in a dimension table Has nothing to do with browsing on the internet Snowflake schemas (see figure 5.6 in the textbook) When low cardinality fields in the dimension have been removed into separate tables and linked back to the original table with artificial keys Kimball is against snowflaking. Questionable space savings Defeats the use of bitmap indexes First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down (continued) Importance of High Quality Verbose Attributes The quality of the DW is measured by the quality of 28 The quality of the DW is measured by the quality of the dimension attributes An ideal dimension table contains many readable text fields describing the members of the dimension Fully expanded words Not codes or abbreviations eliminate See the top of the next slide First Course on Dimensional Modeling Basic Data Modeling Techniques Dimension attributes should be: Verbose (full words) Descriptive Complete (no missing values) Quality Assured (no misspellings, impossible 29 Quality Assured (no misspellings, impossible values, obsolete or orphaned values, or cosmetically different versions of the same attribute) Indexed (perhaps b-tree for high cardinality and bitmap for low cardinality) Equally available (in single flat-denormalized dimension) Documented (in metadata that explains the origin and interpretation of each attribute) First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down (continued) Importance of High Quality Verbose Attributes (continued) Kimball recommends a standard time dimension 30 Kimball recommends a standard time dimension Includes a multinational sub-dimension See next slide for details Kimball recommends that a name and address record should be broken down into as many parts as possible Replace abbreviations with full text Kimball recommends that, for commercial customers, make a separate customer record for each level in the hierarchy First Course on Dimensional Modeling Basic Data Modeling Techniques 31 First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down (continued) For slowly changing dimensions (like product or customer), there are 3 options Type 1 - Overwrite the dimension record, losing history 32 Type 1 - Overwrite the dimension record, losing history Whenever the old value has no business significance Type 2 - Create a new dimension records (with a new surrogate key) Whenever a true change has take place and it is appropriate to partition history by different descriptions Type 3 - Create an old field to store the previous value Whenever it is logically possible to act as if the change had not occurred First Course on Dimensional Modeling Basic Data Modeling Techniques Inside Dimension Tables, Drilling Up and Down (continued) Rapidly changing dimensions Probably use the Type 2 technique Large dimensions (millions of records) Modern databases will support these May require suppressing or not creating some records 33 May require suppressing or not creating some records Rapidly changing monster (large) dimensions May need to break off hot (rapidly changing) dimensions into their own dimension table Degenerate dimensions i.e. Order number in an order detail fact table. Keep this in the fact table Junk dimensions Misc. flags and text Kimball recommends putting them together in a dimension First Course on Dimensional Modeling Basic Data Modeling Techniques FKs, PKs, and Surrogate Keys All DW keys are surrogate keys Ensure that they have no meaning Never use original production keys Typically 4 byte integer (holds 2 billions values) 34 Typically 4 byte integer (holds 2 billions values) Dates Keys Use surrogate key - Since some Date built-in fields are 8 bytes, 4 byte surrogate key will save 4 bytes Facts in the fact table should be chosen to be perfectly additive First Course on Dimensional Modeling Basic Data Modeling Techniques 4 Step Design Method for Designing an Individual Fact Table - 4 choices made in order 1. Single source vs. multi source data mart Kimball recommends starting with single source 2. Fact table grain - 3 styles 35 2. Fact table grain - 3 styles Individual transactions (i.e. sales records) Snapshot - activity during a period (i.e. daily sales) Line items from control documents (i.e. invoice lines) 3. Choose dimensions Examine all data and attach single valued descriptors as dimensions 4. Chose the facts Dependent upon the grain of the fact table (#2 above) Store aggregate or summary records in different fact tables First Course on Dimensional Modeling Basic Data Modeling Techniques Families of Fact Tables A single data mart can be a coordinated set of fact tables They use conformed dimensions 4 reasons for building families of fact tables 36 4 reasons for building families of fact tables 1. Chains and circles i.e. order, product, or customer evolves through a series of steps. Each step captures transactions or snapshots. Each step would have a fact table. Often called value chain value circle when a business or multiple entities can share data with the same kind of transactions 2. Heterogeneous Product Schemas May have multiple fact tables. i.e. bank has one account fact table and another fact table with the checking acct subset. First Course on Dimensional Modeling Basic Data Modeling Techniques Families of Fact Tables (continued) 4 reasons for building families of fact tables (continued) 3. Transaction and Snapshot Schemas Virtually every data mart has some need for 2 versions of data 37 Virtually every data mart has some need for 2 versions of data One for transactions One for periodic snapshots May need Current Rolling Snapshot i.e. keeping n months of periodic data 4. Aggregates Stored summaries meant to improve performance Created in separate fact tables First Course on Dimensional Modeling Basic Data Modeling Techniques Factless Fact Tables When the designer finds no facts to go into the fact table 2 situations - events and coverage Events recording with something happens 38 Events recording with something happens i.e. student attendance no record when the student doesnt attend Dummy attribute is added for later aggregation purposes Coverage when data is not available i.e. sales promotion data only available for items sold in that promotion Fact table will not contain items not sold during that sales promotion Fact table will also not contain items not on promotion
The Policy Administrator Must Be Identified On The Policy Document As The Primary Contact For Providing Additional Information or Suggesting Revisions To The Policy