Sie sind auf Seite 1von 16

Data Warehouse Business Intelligence Combination of technologies like Data Warehousing (DW) On-Line Analytical Processing (OLAP) Data

ata Mining (DM) Data Visuali ation (V!") Decision Analysis (#hat-if) Customer $elationshi% Management (C$M) Operational Data Presents a &ynamic 'ie# of the business Must be ke%t u%-to-&ate an& current at all times (%&ate& by transactions entere& by &ata-entry o%erators or s%ecially traine& en& users !s maintaine& in &etail (tili ation is %re&ictable) "ystems can be o%timi e& for %ro*ecte& #orkloa&s +igh 'olume of transactions, each of #hich affects a small %ortion of the &ata (sers &o not nee& to un&erstan& &ata structures -unctional orientation Analytical Data Presents a static 'ie# of the business .n&-user access is usually rea&-only More concerne& #ith summary information (sage is un%re&ictable in terms of &e%th of information nee&e& by the user "maller number of /ueries, each of #hich may access large amounts of &ata (sers nee& to un&erstan& the structure of the &ata (an& business rules) to &ra# meaningful conclusions from the &ata "ub*ect -orientation Database 0roa&ly classifie& into 1) OL2P (Online 2ransactional Processing) D0 3) OLAP (Online Analytical Processing) D0 OLAP "licing an& &icing of &ata is calle& as Online Analytical Processing (OLAP)) OLAP only ser'es the nee&s of &ata #arehousing than OL2P) OLAP systems allo# a& hoc %rocessing an& su%%ort access to &ata o'er time %erio&s) OLAP systems are the aggregation, transformation, integration an& historical collection of OL2P &ata from one or more systems)

2y%ical OLAP o%erations4 1) $oll u% (&rill u%) - summarize data by climbing hierarchy or by dimension reduction. 3) Drill &o#n(roll &o#n) - from higher level summary to lower level summary or detailed data, or - introducing new dimensions 5) "lice an& &ice - project and select 6) Pi'ot (rotate) - reorient the cube, visualization, 3D to series of 2D planes.

OLAP vs OLTP Slno OLTP 1) 2ransaction Oriente& 3) Com%le7 &ata mo&el (fully normali e&) 5) "maller &ata 'olume (fe# historical &ata) 6) Many, 9small9 /ueries :) ;) -re/uent u%&ates +uge no) of users(clerks)) OLAP Decision Oriente& ($e%orts) "im%le &ata mo&el (multi&imensional8&e-normali e&) Larger &ata 'olumes (collection of historical &ata) -e#er, but 9bigger9 /ueries -re/uent rea&s, in-fre/uent u%&ates (&aily) Only fe# users(Management Personnel)

Objective of Data Warehouse 2he %rimary %ur%ose of a &ata #arehouse is to %ro'i&e easy access to s%ecially %re%are& &ata that can be use& #ith &ecision su%%ort a%%lications, such as management re%orting, /ueries, &ecision su%%ort systems, an& e7ecuti'e information systems) Decision Support A Decision "u%%ort "ystem (D"") is a system that %ro'i&es managers #ith information they nee& to make &ecisions) 2hese systems ha'e the effect of em%o#ering em%loyees at all le'els, %ro'i&ing them access to business an& financial information that &irectly im%act their %ro&ucti'ity an& /uality of #ork E ecutive infor!ation syste!s An .7ecuti'e information system (.!") is a concise sna%shot of ho# the com%any is &oing to&ay) Consi&er it as an electronic e7ecuti'e briefing) .!" allo#s greater fle7ibility in <slicing-an&-&icing9 &ata, i)e)= it allo#s e7%loration of &ata through multi%le &imensions or 'ie#s)

Why Data"arehouse# 0y centrali ing &ata 1) 2he /ueries can be ans#ere& locally #ithout accessing the original information sources) 2hus, high /uery %erformance can be obtaine& for com%le7 aggregation /ueries that are nee&e& for in-&e%th analysis, &ecision su%%ort an& &ata mining > a #ay of e7tracting rele'ant &ata from a 'ast &atabase) 3) On-line Analytical Processing (OLAP) is &ecou%le& (se%arate&) as much as %ossible from On-line 2ransaction Processing (OL2P)) 2hus making information accessible to &ecision makers a'oi&ing interference of OLAP #ith local %rocessing at the o%erational sources) Data "arehouse A &ecision su%%ort &atabase that is maintaine& se%arately from the organi ation?s o%erational &atabases A Data Warehouse is an enter%rise-#ise collection of "ub*ect oriente& !ntegrate& 2ime 'ariant @on-'olatile &ata in su%%ort of management?s &ecision making %rocess) - W) +) !nmon, 1AA5 BSubject riented - Data #arehouses focuses on high-le'el business entities like sales,marketing,etc) B!ntegrated - Data in the #arehouse is obtaine& from multi%le sources an& ke%t in a consistent format) B"ime-#arying - .'ery &ata com%onent in the &ate #arehouse associates itself #ith some %oint of time like weekly,monthly,quarterly, yearly B$on-volatile - Dw stores historical data) Data &oes not change once it gets into the #arehouse) Only loa&8refresh) Data from the o%erational systems are .7tracte& Cleanse& 2ransforme& 1) case con'ersion, 3) &ata trimming, 5) concatenation, 6) &ataty%e con'ersion Aggregate& Loa&e& into DW

Perio&ically refreshe& to reflect u%&ates at the sources an& %urge& from the #arehouse onto slo#er archi'al storage)

$se of DW% A&-hoc analyses an& re%orts Data mining4 i&entification of tren&s Management !nformation "ystems Designing a &atabase for a Data Warehouse 1) Define (ser re/uirements, consi&ering &ifferent 'ie#s of users from &ifferent &e%artments) 3) !&entify &ata integrity, synchroni ation an& security issues8bottlenecks) 5) !&entify technology, %erformance, a'ailability C utili ation re/uirements) 6) $e'ie# normali e& 'ie# of relational &ata to i&entify entities) :) !&entify &imensions) ;) Create an& organi e hierarchies of &imensions) D) !&entify attributes of &imensions) E) !&entify fact table(s)) A) Create &ata re%ository (meta&ata)) 1F) A&& calculations) Data!art Datamart is a subset of &ata #arehouse an& it is &esigne& for a %articular line of business, such as sales, marketing, or finance) !n a &e%en&ent &ata mart, &ata can be &eri'e& from an enter%rise-#i&e &ata #arehouse) !n an in&e%en&ent &ata mart, &ata can be collecte& &irectly from sources May be structure& for s%ecific access tools Datamart is the &ata #arehouse you really use Why DatamartG 1) Data#arehouse %ro*ects are 'ery e7%ensi'e an& time taking) 3) "uccess rate of DW+ %ro*ects is 'ery less 2o a'oi& single %oint of loss #e i&entify &e%artment #ise nee&s an& buil& Datamart) !f succee&e& #e go for other &e%artments an& integrate all &atamarts into a Data#arehouse) A&'antages !m%ro'e &ata access %erformance "im%lify en&-user &ata structures -acilitate a& hoc re%orting Slno Data "arehouse Data !art

1)

DW O%erates on an enter%rise le'el an& contains all &ata use& for re%orting an& analysis

Data Mart is use& by a s%ecific business &e%artment an& is focuse& on a s%ecific sub*ect (business area)) DM is a subset of DW+

DW% A'(%ITE(%T$'E Data #arehouse architecture is a #ay of re%resenting the o'erall structure of &ata, communication, %rocessing an& %resentation that is %lanne&, for en&-user com%uting #ithin the enter%rise) 2he architecture has the follo#ing main %arts4 O%erational &ata base !nformation access layer Data Access layer Data &ictionary (meta&ata) layer Process management layer A%%lication messaging layer Processing (Data Warehouse) layer Data "taging layer

perational data is the information relate& to &ay-to-&ay functioning of an organi ation) An o%erational &atabase stores business transactions critical to the functioning of the organi ation) !nformation access layer is the layer that the en&-user &eals #ith &irectly) .7am%les of these are a&-hoc /uery tools like 0usiness Ob*ects, Po#er Play an& !m%rom%tu)

Data access layer is the &ata interchange layer) 2his layer %ro'i&es interface bet#een o%erational &ata bases an& information access layers) 2he common &ata language use& is H"IL?) A familiar e7am%le of a &ata access layer is HOD0C?) %etadata layer hol&s a re%ository of Meta&ata information) Meta&ata is &efine& as &ata about &ata, resulting in an intelligent, efficient #ay to manage &ata) Meta&ata %ro'i&es the structure an& content of the &ata #arehouse, source an& ma%%ing information, transformation 8 integration &escri%tion an& business rules) !t is essential for /uality im%ro'ement in a Data Warehouse) &rocess management layer is in'ol'e& in sche&uling the 'arious tasks that must be e7ecute& to buil& an& maintain the &ata #arehouse an& &ata re%ository) !t also hel%s to kee% the Data Warehouse u%-to-&ate) 'pplication messaging layer trans%orts information aroun& the enter%rises? com%uting net#ork) !t also acts as Hmi&&le-#are? an& isolates a%%lications from e7act &ata format on either en&) &rocessing (data warehouse) layer is the logical 'ie# of the informational &ata) !t also %erforms the summari ation, loa&ing an& %rocessing of &ata from o%erational &atabases) Data staging layer manages &ata re%lication across ser'ers) !t also manages &ata transformation)

ETL 1) .2L means .7traction, transformation, an& loa&ing) 3) .2L refers to the metho&s in'ol'e& in accessing an& mani%ulating source &ata an& loa&ing it into target &atabase)

ETL Process .tl is a %rocess that in'ol'es the follo#ing tasks4 e tracting &ata from source o%erational or archi'e systems #hich are the %rimary source of &ata for the &ata #arehouse transfor!ing the &ata - #hich may in'ol'e cleaning, filtering, 'ali&ating an& a%%lying business rules loa&ing the &ata into a &ata #arehouse or any other &atabase or a%%lication that houses &ata Transfor! 1) Denormali e &ata 3) Data cleaning) 5) Case con'ersion 6) Data trimming :) "tring concatenation ;) &ataty%e con'ersion D) Deco&ing E) calculation A) Data correction) (leansing 2he %rocess of resol'ing inconsistencies an& fi7ing the anomalies in source &ata, ty%ically as %art of the .2L %rocess) Data Staging Area 1) Most com%le7 %art in the architecture) 3) A %lace #here &ata is %rocesse& before entering the #arehouse 5) !t in'ol'es))) .7traction (.) 2ransformation (2) Loa& (L) !n&e7ing Popular ETL Tools Tool )a!e !nformatica (o!pany )a!e !nformatica Cor%oration

D28"tu&io Data"tage Ab !nitio Data Junction Oracle Warehouse 0uil&er Microsoft "IL "er'er !ntegration 2ransformOnDeman& 2ransformation Manager

.mbarca&ero 2echnologies !0M Ab !nitio "oft#are Cor%oration Per'asi'e "oft#are Oracle Cor%oration Microsoft "olon&e .2L "olutions

Di!ensional *o&eling Means storing &ata in fact an& &imension tables) +ere &ata is fully &enormali e& Di!ension table 1) Dimension table gi'es the &escri%ti'e attributes of a business) 3) 2hey are fully &enormali e& 5) !t has a %rimary key 6) Data arrange& in hierarchical manner (%ro&uct to category= month to year) > if so #e can use for &rill &o#n an& &rill u% analysis :) +as less no) of recor&s ;) +as rich no) of columns D) +ea'ily in&e7e& E) Dimension tables are sometimes calle& looku% or reference tables) Types of Di!ensions 1) @ormal Dimension 3) Confirme& Dimension 5) Junk Dimension 6) Degenerate& Dimension :) $ole Playing Dimension (onfir!e& Di!ension Dimension table use& by more than one fact table is calle& Confirme& Dimensions (dimensions that are linked to multiple fact tables)
D1 FT1 D3 D D2 D1 FT2 D3 D2 D5 FT3

A&'4 1) 2o a'oi& unnecessary s%ace 3) $e&uce time 5) Drill across fact table

+un, Di!ension is an abstract &imension it #ill remo'e number of foreign keys from fact table) 2his is achie'e& by combining 3 or more &imensions into a single &imension)

Degenerate& Di!ension %eans a *ey value or dimension table which does not have descriptive attributes. i.e.) a non foreign *ey and non numerical measure column used for grouping purpose .7 4 !n'oice @umber, 2icket @umber

'ole Playing Di!ension Means a single %hysical &imension table %lays &ifferent role #ith the hel% of 'ie#s)

-act Table 1) 2he centrali e& table in a star schema is calle& as -AC2 table 3) A fact table ty%ically has t#o ty%es of columns4 @umerical measures an& -oreign keys to &imension tables) 5) 2he %rimary key of a fact table is usually a com%osite key that is ma&e u% of all of its foreign keys 6) -act tables store &ifferent ty%es of measures like a&&iti'e, non a&&iti'e an& semi a&&iti'e measures :) A fact table might contain either &etail le'el facts or facts that ha'e been aggregate& ;) A fact table usually contains facts #ith the same le'el of aggregation) D) +as millions of recor&s *easure Types A&&iti'e - Measures that can be summari e& across all &imensions) o .74 sales @on A&&iti'e - Measures that cannot be summari e& across all &imensions) o .74 a'erages "emi A&&iti'e - Measures that can be summari e& across fe# &imensions an& not #ith others) o .74 in'entory le'els

-actless -act A fact table that contains no measures or facts is calle& as -actless -act table) Slo"ly (hanging Di!ensions 1) Dimensions that change o'er time are calle& "lo#ly Changing Dimensions 3) "lo#ly Changing Dimensions are often categori e& into three ty%es namely 2y%e1, 2y%e3 an& 2y%e5

Type . S(D / (se& if history is not re/uire& O'er#riting the ol& 'alues) Pro&uct Price in 3FF64 Pro&uct !D(PK) Lear Pro&uct @ame Pro&uct Price 1 3FF6 Pro&uct1 M1:F Pro&uct Price in 3FF:4 Pro&uct !D(PK) Lear Pro&uct @ame Pro&uct Price 1 3FF: Pro&uct1 M3:F Type 0 S(D/ !f history an& current 'alue nee&e& Creating another a&&itional recor&)(ne# recor& #ith ne# changes an& ne# surrogate key) Mostly %referre& in &imensional mo&eling Pro&uct Pro&uct !D(PK) 1 .ffecti'e Date2ime(PK) F1-F1-3FF6 Pro&uct @ame 3FF6 Pro&uct1 Lear Pro&uct Price M1:F .7%iry Date2ime 13-51-3FF6

13)FFAM F1-F1-3FF: 13)FFAM

11):APM 3FF: Pro&uct1 M3:F

Type 1 S(D/ (se& if changes are 'ery less Pre'ious one le'el of history a'ailable Creating ne# fiel&s) Pro&uct Price in 3FF: Current Pro&uct Current Ol& Pro&uct Pro&uct !D(PK) Ol& Lear Lear @ame Pro&uct Price Price 1 3FF: Pro&uct1 M3:F M1:F 3FF6 Surrogate ,eys "urrogate keys are al#ays numeric an& uni/ue on a table le'el #hich makes it easy to &istinguish an& track 'alues change& o'er time) "urrogate keys are integers that are assigne& se/uentially as nee&e& to %o%ulate a &imension) "urrogate keys merely ser'e to *oin &imensional tables to the fact table) "urrogate keys are beneficial as the follo#ing reasons4 1) $e&uces s%ace use& by fact table 3) -aster retrie'al of &ata ( since al%hanumerical retrie'al is costlier than numerical &ata) 5) Maintaining in&e7 is easier #ith numeric key) 6) Maintain all slo#ly changing &imenion) Data "arehouse Design 2he &ata #arehouse &esign essentially consists of four ste%s, #hich are as follo#s4 1) !&entifying facts an& &imensions 3) Designing fact tables 5) Designing &imension tables 6) Designing &atabase schemas Types of &atabase sche!as 2here are three main ty%es of &atabase schemas4 1) "tar "chema, 3) "no#flake "chema an& 5) "tarflake schema) Star Sche!a 1) !t is the sim%lest form of &ata #arehouse schema that contains one or more &imensions an& fact tables

3) !t is calle& a star schema because the entity-relationshi% &iagram bet#een &imensions an& fact tables resembles a star #here one fact table is connecte& to multi%le &imensions 5) 2he center of the star schema consists of a large fact table an& it %oints to#ar&s the &imension tables 6) -act 2able N +ighly @ormali e& Dimension 2able N +ighly &enormali e&) :) !t can be 'ery effecti'e to treat fact &ata as %rimarily rea&-only &ata, an& &imensional &ata as &ata that #ill change o'er a %erio& of time A&'antages4 "tar schema is easy to &efine) !t re&uces the number of %hysical *oins) Pro'i&es 'ery sim%le meta&ata) Dra#backs4 "ummary &ata in -act tables (such as "ales amount by region, or &istrict-#ise, or year#ise) yiel&s %oor %erformance for summary le'els an& huge &imension tables)

Steps in &esigning Star Sche!a 1) !&entify a business %rocess for analysis (like sales)) 3) !&entify measures or facts (sales &ollar)) 5) !&entify &imensions for facts (%ro&uct &imension, location &imension, time &imension, organi ation &imension))

6) List the columns that &escribe each &imension) ($egion name, branch name, em%loyee name)) :) Determine the lo#est le'el of summary in a fact table (sales &ollar)) -act constellation4 Dimension tables #ill, in turn, ha'e their o#n &imension tables) !n this case, the Store &imension #ill contain District i&s an& $egion i&s, #hich #ill reference &istrict an& region &imensions of Store &imension, res%ecti'ely) 2his "chema is calle& -act (onstellation "chema) Sno"fla,e sche!a 1) A sno#flake schema is a term that &escribes a star schema structure normali e& through the use of outrigger tables) i)e &imension table hierarchies are broken into sim%ler tables 3) $e%resent &imensional hierarchy &irectly by normali ing the &imension tables ie) all &imensional information is store& in thir& normal form 5) 2his im%lies &i'i&ing the &imension tables into more tables, thus a'oi&ing nonkey attributes to be &e%en&ent on each other) A&'antages4 "no#flake schema %ro'i&es best %erformance #hen /ueries in'ol'e aggregation) Disa&'antages4 Maintenance is com%licate&) !ncrease in the number of tables) More *oins #ill be nee&e&

Sno"fla,e Sche!a Starfla,e Sche!a 1) combinations of &enormali e& "tar an& normali e& "no#flake schemas)

Star Sche!a vs Sno"fla,e Sche!a Slno Star Sche!a 1) Dimension table #ill not ha'e any %arent table 3) +ierarchies for the &imensions are store& in the &imensional table itself Sno" -la,e Dimension table #ill ha'e one or more %arent tables +ierarchies are broken into se%arate tables in sno# flake schema

2ranularity Means #hat &etail &ata to be store& in fact table 2y%es of Oranularity 1) 2ransactional Le'el Oranularity 3) Perio&ic "na%shot Oranularity Transactional Level 2ranularity Mostly use& .ach an& e'ery transaction store& in fact table Drill &o#n an& &rill u% analysis can be &one Disa&'antage 1) "i e increases) Perio&ic Snapshot 2ranularity "ummari ing &ata o'er a %erio& is store& in fact table A&v / -aster retrie'al (less recor&s) Disa&v 4 Detail information not a'ailable

-A3 %ierarchy 1) +ierarchies are logical structures that use or&ere& le'els as a means of organi ing &ata) 3) A hierarchy can be use& to &efine &ata aggregation) .7am%le countryPcityPstateP i% in a time &imension, a hierarchy might be use& to aggregate &ata from the Month le'el to the Iuarter le'el, from the Iuarter le'el to the Lear le'el) Level A %osition in a hierarchy) -or e7am%le, a time &imension might ha'e a hierarchy that re%resents &ata at the Month, Iuarter, an& Lear le'els) Operational Data Store

!n recent times, OLAP functionality is being built into OL2P systems #hich is calle& OD" (o%erational &ata store)) A %hysical set of tables sitting bet#een the o%erational systems an& the &ata #arehouse or a s%ecially a&ministere& hot %artition of the &ata #arehouse itself) 2he main reason of OD" is to %ro'i&e imme&iate re%orting of o%erational results if neither the o%erational system nor the regular &ata #arehouse can %ro'i&e satisfactory accsee) "ince an OD" is necessarily an e7tract of the o%erational &ata, it also may %lay the role of source for &ata #arehouse)

Data Staging Area 1) A storage area that clean, transform, combine, &u%licate an& %re%are source &ata for use in the &ata #arehouse) 3) 2he &ata staging area is e'erything in bet#een the source system an& &ata %resentation ser'er) 5) @o /uerying shoul& be &one in the &ata staging area because the &ata staging area normally is not set u% to han&le fine-graine& security, in&e7ing or aggregation for %erformance) Data Warehouse Bus *atri 1) 2he matri7 hel%s %rioriti e #hich &imensions shoul& be tackle& first for conformity gi'en their %rominent roles) 3) 2he matri7 allo#s us to communicate effecti'ely #ithin an& across &ata mart teams) 5) 2he columns of the matri7 re%resent the common &imensions) 6) 2he ro#s i&entify the organi ations business %rocesses) Degenerate& Di!ension O%erational control numbers such as in'oice numbers, or&er numbers an& bill of la&ing numbers looks like &imension key in a fact table but &o not *oin to any actual &imension table) 2hey gi'e rise to em%ty &imension hence #e refer them as Degenerate& Dimension(DD))

Das könnte Ihnen auch gefallen