Beruflich Dokumente
Kultur Dokumente
VI
Data Warehousing
Mumbai University Question Paper Solutions
CONTENTS
Page
Year of Exams
No.
April 14 1
Oct. 14 34
T.Y. B.Sc. (IT): Sem. VI
Data Warehousing
Time: 2 ½ Hrs.] Mumbai University Question Paper Solution : April 14 [Marks : 60
ar
achieve the strategic level goals.
Used for Online Analytical Processing (OLAP). This reads the historical
data for the Users for business decisions.
The Tables and joins are simple since they are de-normalized. This is
k
done to reduce the response time for analytical queries.
Data – Modeling techniques are used for the Data Warehouse design.
Optimized for read operations.
an
High performance for analytical queries.
Is usually a Database.
you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter,
sales in this case, makes the data warehouse subject oriented.
2. Integrated : Integration is closely related to subject orientation. Data
Vi
207/241/e:0315/0315-390/BSc/IT/TY/DW/EQ_Soln/Syllabus 1
Vidyalankar : T.Y. B.Sc.(IT) DW
Q.1(b) Explain the additive, semiadditive and non-additive measures with [5]
examples.
(A) Additivity of Facts : A fact is said to be fully additive if it is additive over
every dimension of its dimensionality ; partially additive if additive over at
least one but not all of the dimensions; and non-additive if not additive over
any dimension.
ar
1. Additive measures (Fully additive facts): These are those specific
class of fact measures which can be aggregated across all dimensions
and their hierarchy.
Example : We have sales figures...one may tend to add sales across all
k
quarters to avail the yearly sales..hence this is an example of Additive
measure. (Customerwise sales, yearwise, monthwise, daywise,
productwise, categorywise sales...etc)
an
2. Semi-Additive measures (Semi additive facts): These are those
specific class of fact measures which can be aggregated across some
dimensions but dimensions but not all dimensions.
Example : We have stock levels say 1000 (qty of Item A) on Monday...a
al
sales person sells 200(qty of Item A, so now the stock is 800) on
Tuesday he further sell 300(qty of Item A, now the stock is 500) on
Wednesday...going by basic math On Thursday he should be left with
500(qty of Item A, assuming no inventory has flown in) to obtain current
dy
stock level he cannot aggregate the Stock sales across time dimension
hierarchy...If done he will have inappropriate outcomes.
Q.1(c) What are the various levels of data redundancy in data warehouse? [5]
(A) Data redundancy : There are three levels of redundancies that enterprises
should think about when considering their data warehouse options
“Virtual” or “point-to-point” data warehouse
Central data warehouse
Distributed data warehouse
2
April 14 : Paper Solution
ar
of requests is low.
V. warehouses often provide a starting point for organisations to
learn what end-users are really looking for.
k
Central Data Warehouses
It is a single physical database that contains all data for a specific
functional area, department, division, or enterprise.
an
A central data warehouse may contain records for any specific
period of time and usually, contains information from multiple
operational systems.
These warehouses are real. The data stores here is accessible from
any place and must be loaded and maintained on a regular basis.
al
These warehouses are built around some form of multidimensional
information database server.
Operational Systems
They are the systems that help everyday operations of the enterprise.
They are the backbone system of any enterprise and include order
entry, inventory, manufacturing, payroll, accounting etc.
3
Vidyalankar : T.Y. B.Sc.(IT) DW
Information systems
On the other hand, there are other functions that go on within the
enterprise that have to do with planning, forecasting and managing the
ar
organisation. In this current fast paced world, these functions are very
critical for the survival of the organisation.
Information systems deal with analysing data and making major
decisions about how the enterprise will operate now and in future.
Functions like marketing planning, engineering planning, and financial
k
analysis also require information systems to support them.
But these functions are different from the operational ones and the
an
information required is also different.
These knowledge based functions which help decision makers to plan and
take future decisions are information systems.
Where operational data needs are normally focussed upon a single area,
information data needs often span a number of different areas and need
large amounts of related operational data.
al
The following table summarizes main differences between OLPT and OLAP:
OLTP OLAP
Application Operational: ERP, CRM, Management Information System,
dy
4
April 14 : Paper Solution
k ar
an
The welcome screen will offer four tasks that we can perform with this
assistant. Select the first one to configure the listener, as shown here:
al
dy
Vi
The next screen will ask what we want to do with the listener. The four
options are as follows:
Add
Reconfigure
Delete
Rename
5
Vidyalankar : T.Y. B.Sc.(IT) DW
If the Oracle is getting installed for the first time, only the Add option
will be available. The remainder of the options will be grayed out and will
be unavailable for selection. If they are not, then there is a listener
already configured and we can proceed to the next section—creating the
database.
The next screen will ask us what we want to name the listener. It will
have LISTENER entered by default and that’s a fine name, which states
exactly what it is, so let’s leave it at that and proceed.
The next screen is the protocol selection screen. It will have TCP
already selected, which is what most installations will require. Leave that
ar
selected and proceed to the next screen to select the port number to
use. The default port number is 1521, which is standard for
communicating with Oracle databases and is the one most familiar to
anyone who has ever worked with an Oracle database.
k
That is the last step. It will ask us if we want to configure another
listener, answer "no" and finish out the screens by clicking on the Finish
button back on the main screen.
an
Q.2(b) Explain the procedure for defining source metadata manually with [5]
Data Object Editor.
(A) 1. To start building our source tables for the POS transactional SQL
Server database, Launch the OWB Design Center. Expand the
al
ACME_DW_PROJECT node. Search the already created ACME_POS
module for the SQL Server source database under the Databases |
ODBC node so that is where we’ll create the tables. Navigate to the
Databases | ODBC node, and then select the ACME_POS module under
dy
this node. We will create our source tables under the Tables node, so
let’s right-click on this node and select New Table from the pop-up
menu. As no wizard is available for creating a table, we are using the
Data Object Editor to do this.
2. The first screen we’ll be presented with is a small popup asking us to fill
in the name and a description for the new table we’re creating. We’re
Vi
going to create the metadata for the ITEMS table so let’s change the
name to ITEMS and click OK to continue.
3. Upon selecting OK, we are presented with the Table Editor screen on
the right hand side of the main Design Center interface. It’s a clean
slate that we get to fill in, and will look similar to the following
screenshot:
6
April 14 : Paper Solution
k ar
an
al
The following will be the columns, types, and sizes we’ll use for the Items
dy
table based on what we found in the Items source table in the POS
transaction database:
ITEMS_KEY number (22)
ITEM_NAME varchar2 (50)
ITEM_CATEGORY varchar2 (50)
Vi
1. We can save our work at this point and close the Table Editor window
now before proceeding.
2. When completed, our column list should look like the following
screenshot:
7
Vidyalankar : T.Y. B.Sc.(IT) DW
k ar
Same procedure is continues for the remaining tables. Just do the import
using the Import :
POS_TRANSACTIONS an
POS_TRANS_KEY number (22)
SALES_QUANTITY number (22)
SALES_ASSOCIATE number (22)
REGISTER number (22)
ITEM_SOLD number (22)
al
DATE_SOLD date
AMOUNT number (10, 2)
REGISTERS
dy
STORES
STORES_KEY number (22)
STORE_NAME varchar2 (50)
STORE_ADDRESS1 varchar2 (60)
STORE_ADDRESS2 varchar2 (60)
STORE_CITY varcar2 (50)
STORE_STATE varchar2 (50)
STORE_ZIP varchar2 (50)
REGION_LOCATED_IN number (22)
STORE_NUMBER varchar2 (10)
8
April 14 : Paper Solution
REGIONS
REGIONS_KEY number (22)
REGION_NAME varchar2 (50)
CONTINENT varchar2 (50)
COUNTRY varchar2 (50)
ar
2. To create a new project, select New… either from the pop-up menu or
from the Design drop-down menu. We can have any number of projects
defined, but can work on only one at a time.
k
Difference between a module and a project
A project is defined which holds all the work. The Projects tab is where
we will work on the objects that we are going to design for our data
an
warehouse. It was the old Project Explorer window in the previous
Warehouse Builder release. It has nodes for each of the design objects
we’ll be able to create.
A module is an object in the Design Center that acts as a storage
location for the various definitions and helps us logically group them.
al
There are Files modules that contain file definitions and Databases
modules that contain the database definitions. These Databases modules
are organized as Oracle modules and Non-Oracle modules.
dy
Q.2(d) Draw and explain OWB architecture with suitable diagram. [5]
(A) OWB components and architecture
Oracle Warehouse Builder is composed on the client of the Design Center
(including the Control Center Manager) and the Repository Browser. The
server components are the Control Center Service, the Repository (including
Workspaces), and the Target Schema. A diagram illustrating the various
Vi
9
Vidyalankar : T.Y. B.Sc.(IT) DW
Client
ar
The Design Centre
The Design Center is the main client graphical interface for designing
k
our data warehouse.
It is used to define our sources and targets, and describe the extract,
transform, and load (ETL) processes we use to load the target from the
an
sources. The ETL procedures are what we will define to carry out the
extraction of the data from our sources, any transformations needed on
it and subsequent loading into the data warehouse.
What will be created in the Design Center is a logical design only, not a
physical implementation. This logical design will be stored behind the
al
scenes in a Workspace in the Repository on the server. The user
interacts with the Design Center, which stores all its work in a
Repository Workspace.
dy
Control Center Service, which runs on the server. The user directly
interacts with the Control Center Manager and the Design Center only.
10
April 14 : Paper Solution
The Repository
The Repository is the schema that hosts the design metadata
definitions we create for our sources, targets, and ETL processes.
We will be defining sources, targets, and ETL processes using the
Design Center and the information about what we have defined (the
metadata) is stored in the Repository.
The Repository is a Warehouse Builder software component for which a
separate schema is created when the database is installed.
The Repository will contain one or more Workspaces. A Workspace is
where we will do our work to create the data warehouse.
ar
The repository Client
One final OWB component to consider is the Repository Browser on the
client.
It is a web browser interface for retrieving information from the
k
Repository. It will allow us to view the metadata, create reports, and
audit runtime operations.
Q.3
an
It is the only other component besides the Design Center and the Control
Center Manager that the user interacts with directly.
represented relationally in the database will have one main table to hold
the primary facts/measures, such of count of items sold, or total sale
amount etc.
The tables that are referenced by main table contain all the information
they need and do not need to go down any more levels to reference any
other tables.
The ER diagram of this implementation looks like a star, so it is called as
star schema.
The main table in the middle is referred to as the fact table as it holds
facts or measures and this represents the Cube.
11
Vidyalankar : T.Y. B.Sc.(IT) DW
Dimensions
The tables surrounding the fact table are called as Dimensions. They
contain the descriptive information.
ar
Q.3(b) Explain the steps for importing the metadata for a flat file. [5]
k
(A) Right-click on the module name under the Files node under our project, and
select Import and then Flat File…. The following are the steps to be
an
performed in the File Import screens :
1. The first screen for importing a file is shown in the following screenshot:
al
dy
Vi
This is where we will specify the file we wish to import. Click Add
Sample File and select the counties.csv file. After selecting the file
from the resulting popup, it will fill in the filename on the File Import
screen.
12
April 14 : Paper Solution
k ar
2. If the file viewing is done, just click OK to close the dialog. Click the
Import button now on the File Import screen to begin the import
process. This is Flat File Sample Wizard. The Flat File Sample Wizard
an
now has two paths that we can follow through it, a standard sequence
for simple files and an advanced sequence for more complex files. The
two sets of steps are indicated on the Welcome screen as shown below:
al
dy
Vi
13
Vidyalankar : T.Y. B.Sc.(IT) DW
3. Clicking the Next button will take us to the first step which is shown
below:
k ar
an
This screen displays the information the wizard pulled out of the
file, displayed as columns of information. It knows what’s in the columns
because the file has each column separated by a comma, but doesn’t
know at this point what type of data or column name to use for each
column—so it just displays the data. It picks a name based on the file
al
name, which is fine.
4. Take the advanced path through the wizard which will consist of more
steps, or we can just click the Next button. The simple path is for basic
comma delimited files with single rows separated by a carriage return.
dy
5. Step 2 of the simple steps includes the record and field delimiters
choices as shown next:
Vi
14
April 14 : Paper Solution
Our records are separated by a Comma, and that is the default. The
Enclosures: selection is OWB’s way of specifying the characters that
surround the text values in the file. Frequently, in text-based files such
as CSV files, the text is differentiated from numerical values by
surrounding the text values in double quotes, single quotes, or something
similar. This is where we specify the characters, if any, that surround
the text-field values in this file.
k ar
an
al
6. The final step is where we specify the details about what each field
contains, and give each field a name. Check the box that says Use the
first record as the field names and we’ll see that all the column names
have changed to using the values from that first row.
dy
Vi
15
Vidyalankar : T.Y. B.Sc.(IT) DW
Notice that the field type for the first column has changed. The ID is
now INTEGER instead of character, as it has now correctly detected
that the remaining rows after that first one all contain integer data.
Length is specified there, which defaults to 0.
7. Click on Next to get a summary screen of what the wizard will do, or
just click on the Finish button to continue. After clicking Finish it will
create our file module under the Files node and we will be able to access
it in the Projects tab.
8. We’ll make sure to select Save All from the Design menu in Design
Center to save the metadata we just entered.
ar
Q.3(c) What is module? Explain source module and target module. [5]
(A) Module : A module is an object in the Design Center that acts as a storage
location for the various definitions and helps us logically group them. There
k
are Files modules that contain file definitions and Databases modules that
contain the database definitions. These Databases modules are organized as
Oracle modules and Non-Oracle modules.
an
Source Module
Creating a target user and module
A different module is to be created for target objects. A new module is
created in the Projects tab for our target to hold our data warehouse
al
design objects.
However, before we can do that, we should have a target schema defined in
the database that will hold our target objects when we deploy them.
The target schema is going to be the main location for the data
dy
warehouse. The target is where the actual data warehouse will be built.
Our design will be implemented there.
Every target module must be mapped to a target user schema.
It’s always good to create a separate user schema to become the target
so that user roles in our database can be kept separate.
Vi
Q.3(d) List and explain the functionalities that can be performed by OWB [5]
in order to create data warehouse.
(A) Data Modeling : Most data warehouse designers use a data modeling tool to
create the logical and physical design of the data warehouse. The logical
design ensures that all business requirements, definitions, and rules are
supported. The physical design ensures optimal performance in the planning
of indexes, relationships, data types and properties. To support developers
of OLAP, data mining and reporting systems, the data model also acts as
documentation for the final data warehouse.
16
April 14 : Paper Solution
Data Profiling and Data Quality –Data profiling help to discover value
frequencies, formats and patterns.
Data profiling can be applied to generate statistics about data quality, and
to discover complex patterns, foreign key relationships, and functional
relationships. Using data profiling one can find some perceived defects. But
ar
by this quality cannot be accessed.
By data quality assessment, true assessment of data quality is created.
Metadata management : involves managing data about other data, whereby this
k
“other data” is generally referred to as content data. Metadata management
provides a number of very important benefits to the enterprise :
Consistency of definitions
an
Clarity of relationships
Clarity of data lineage
(A) Staging : Staging is the process of copying the source data temporarily into a
table(s) in the target database. Here one can perform any transformations
that are required before loading the source data into the final target tables.
Staging area
• Staging area is the area used while designing the ETL.
Vi
Advantages
If the source data is in another database than an Oracle Database, the
reliability of the connection to the database and the performance of the
link while pulling data across is to be taken into account.
17
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
transformations can then be run on it without impacting the
transactional system.
The individual process to stage the data to a table in the oracle
database simply involves copying the data one-for-one over to the
k
Oracle database, and this runs in less than 30 seconds. This means the
source database connection is only open for 30 seconds, whereas it had
to constantly work for hours without a staging table.
an
Another advantages is that if the ETL process needs to be restarted,
there is no need to go back to disturb the source system to retrieve the
data.
Q.4(b) List and explain the use of various windows available in mapping [5]
al
editor.
(A) Mapping : The Mapping window is the main working area in the center of
the above image where mapping is designed. This window is also referred
to as the canvas. This is the graphical display that will show the
dy
operators being used and the connections between the operators that
indicate the data flow from source to target.
Explorer :
It has two tabs – Available Objects tab and Selected Object Tab
Available objects – it displays objects defined in our project elsewhere
Vi
18
April 14 : Paper Solution
k ar
Component Palette : The Component Palette contains each of the
an
objects that can be used in our mapping. We can click on the object we
want to place in the mapping and drag it onto the canvas.
Bird’s Eye View : This window displays a miniature version of the entire
canvas and allows us to scroll around the canvas without using the scroll
bars.
al
Q.4(c) Explain the various OWB operators. [5]
(A) Following are the types of OWB operators :
Source and Target Operators
dy
19
Vidyalankar : T.Y. B.Sc.(IT) DW
Common operators
Constant: Represents a constant value that is needed. It can be used to
load a default value for a field that doesn’t have any input from another
source, for instance.
View Operator: Represents a database view. Source data is frequently
retrieved via a view in the source database that can pull data from
multiple sources into a single, easily accessible view.
Sequence Operator: Can be used to represent a database sequence,
which is an automatic generator of sequential unique numbers and is
most often used for populating a primary key field.
ar
Construct Object: This operator can be used to actually construct an
Oracle object type in our mapping.
k
The true power of a data warehouse lies in the restructuring of the source
data into a format that greatly facilitates the querying of large amounts of
data over different time periods. For this, we need to transform the source
an
data into a new structure. That is the purpose of the transformation (or
data flow) operators.
Some of the common data flow operators we’ll see are as follows:
Aggregator: There are times when source data is at a finer level of
al
detail than we need. So we need to sum the data up to a higher level, or
apply some other aggregation type function such as an average function.
This is the purpose of the Aggregator operator.
Deduplicator: Sometimes our data records will contain duplicate
dy
20
April 14 : Paper Solution
ar
Transformation Operator: All these operators are transformation
operators but there is one operator type specifically named
"Transformation". This operator can be used to invoke a PL/SQL
function or procedure with some of our source data as input to provide a
k
transformation of data.
Table Function Operator: A Table Function Operator can be seen in the
date_dim_map map. There are three Table Function operators defined:
an
This kind of operator represents a Table Function, which is defined in
PL/SQL and is a function that can be queried like a table to return rows
of information.
Pre/Post-Processing Operators
al
There is a small group of operators that allow us to perform operations
before the mapping process begins, or after the mapping process ends.
These are the pre- and post-processing operators and mapping input and
output operators. We can perform functions or procedures before or after
dy
a mapping runs, and can also accept input or provide output from a mapping
process.
Mapping Input Parameter: This operator allows us to pass a
parameter(s) into a mapping process.
Mapping Output Parameter: As the name suggests, this is similar to the
Mapping Input Parameter operator, but provides a value as output from
Vi
our mapping.
Post-Mapping Process: Allows us to invoke a function or procedure after
the mapping completes its processing.
Pre-Mapping Process: It allows us to invoke a function or procedure
before the mapping process begins.
21
Vidyalankar : T.Y. B.Sc.(IT) DW
Q.4(d) Write the steps for building staging area table using Data Object [5]
Editor.
(A) Launch the OWB Design Center. Expand the project node. For this, the
module should be created first, then the target use and the target module
of the same name.
The steps to create the staging area table in our target database are:
1. Navigate to the Databases | Oracle | ACME_DWH module. Right-click
on the Table node and select New Table from the pop-up menu.
2. Upon selecting New Table, enter the name of the new table and an
ar
optional description.
k
an
al
dy
3. The first tab is the Name tab where it displays the name we just gave it
in the opening popup.
4. Click on the Columns tab and enter the information that describes the
columns of our new table.
The following will then be the column names, types, and sizes we’ll use
Vi
for our staging table based on what we found in the source tables in the
POS transaction database:
SALE_QUANTITY NUMBER (0, 0)
SALE_DOLLAR_AMOUNT NUMBER (10, 2)
SALE_DATE DATE
PRODUCT_NAME VARCHAR2 (50)
PRODUCT_SKU VARCHAR2 (50)
PRODUCT_CATEGORY VARCHAR2 (50)
PRODUCT_BRAND VARCHAR2 (50)
22
April 14 : Paper Solution
ar
STORE_COUNTRY VARCHAR2 (50)
When completed, our column list should look like the following
screenshot:
k
an
al
5. Save work using the Ctrl+S keys, or from the File | Save All main menu
dy
23
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
Indexes : The next tab provided in the Table Editor is the Indexes tab.
An index can greatly facilitate rapid access to a particular record.
Partitions : A partition is a way of breaking down the data stored in a
table into subsets that are stored separately. This can greatly speed up
k
data access for retrieving random records, as the database will know
the partition that contains the record being searched for based on the
partitioning scheme used.
an
Attribute Sets :The next tab is the Attribute Sets tab. An Attribute
Set is a way to group attributes of an object in an order that we can
specify when we create an attribute set. It is useful for grouping
subsets of an object’s attributes (or columns) for a later use.
Data Rules : The next tab is Data Rules. A data rule can be specified in
al
the Warehouse Builder to enforce rules for data values or relationships
between tables. It is used for ensuring that only high-quality data is
loaded into the warehouse.
dy
24
April 14 : Paper Solution
Example :
1. In the Design Centre, open the COUNTIES_LOOKUP table in the Table
Editor by double-clicking on it under the Tables node.
2. Click on the Keys tab.
3. Click on the Add Constraint button.
4. Type PK_COUNTIES_LOOKUP (or any other naming convention we
might choose) in the Name column.
5. In the Type column, click on the drop-down menu and select Primary Key.
6. Click on the Local Columns column, and then click on the Add Local
Column button.
ar
7. Click on the drop-down menu that appears and select the ID column.
8. Close the Table Editor window.
k
(A) Control Center Manager
The Control Center Manager is the interface the Warehouse Builder
provides for interacting with the target schema.
an
This is where the deployment of objects and subsequent execution of
generated code takes place.
The Design Center is for manipulating metadata only on the repository.
Deployment and execution take place in the target schema through the
Control Center Service.
al
The Control Center Manager is our interface into the process where we
can deploy objects and mappings, check on the status of previous
deployments, and execute the generated code in the target schema. ‘
Control Center Manager is launched from the Tools menu of the Design
dy
Center main menu. Click on the very first menu entry, which says Control
Center Manager. This will open up a new window to run the Control
Center Manager, which will look similar to the following:
Vi
25
Vidyalankar : T.Y. B.Sc.(IT) DW
Q.5(c) Write the steps for validating and generating in Data Object Editor. [5]
(A) Steps for validating in the Data Object Editor
1. Double-click on the POS_TRANS_STAGE table name in the Design
Center to launch the Data Object Editor.
2. Right-click on the object displayed on the Canvas and select Validate
from the pop-up menu, or we can select Validate from the Object menu
on the main editor menu bar.
3. Another option is available if we want to validate every object currently
loaded into our Data Object Editor. It is to select Validate All from the
Diagram menu entry on the main editor menu bar.
ar
4. We can also press the validate icon on the General toolbar, which is
circled in the following image of the toolbar icons:
k
5. When we validate an object in the editor, we do not get the Validation
an
Results pop-up dialog box.
6. Here we get another window created in the editor, the Generation
window, which appears below the Canvas window. The window that is
produced will look similar to the following:
al
dy
In many cases, the error message will be long and the window will display the
message truncated in the window.
Vi
26
April 14 : Paper Solution
k ar
an
al
The window also provides us a Validation Messages tab, which will
display any messages generated as a result of validation.
Extract
- It involves extracting the data from the source system(s). most
warehousing projects consolidate data from different source systems.
- Each separate system may also use a different data organization and/or
format. Common data-source formats include relational databases, XML
and flat files.
- The extraction phase aims to convert the data into a single format
appropriate for transformation processing.
27
Vidyalankar : T.Y. B.Sc.(IT) DW
Transform
- The data transformation stage applies a series of rules of functions to
the extracted data from the source to derive the data for loading into
the end target.
- an important function of data transformation is cleansing of data that
aim to pass only proper data to the target.
- One or more of the following transformation types may be required to
meet the business needs
Selecting only certain columns to load
translating coded values
ar
Encoding free-form values
deriving a new calculated value.
Sorting
Joining
k
Aggregation
Load an
This phase loads the data into the end target that may be a simple flat
file or a data warehouse.
28
April 14 : Paper Solution
In practice, such simple slices are rare: more typically, the requested
data is a compound slice where two or more dimensions are nested as
rows or columns.
That means the goal is to provide linear response time regardless of
where the data is being retrieved from in the hypercube.
k ar
The design goal should be to offer a complete algebraic ability where
any cell in the hypercube can be derived from any others, using all
standard business and statistical functions including conditional logic.
an
Q.6(b) Write a short notes on : [5]
(i) Metadata Snapshots (ii) The Import Metadata Wizard
(A) Metadata Snapshots : A snapshot captures all the metadata information
about an object at the time the snapshot is taken and stores it for later
al
retrieval.
It is a way to save a version of an object should we need to go back to a
previous version or compare a current version with a previous one.
To take a snapshot of an object from the Design Center, right-click on
dy
the object and select the Snapshot menu entry. This will give three
options to choose from as shown next:
We can create a new snapshot, add this object to an existing snapshot,
or compare this object with an already saved snapshot.
There are two types of snapshots we can take: a full snapshot that
Vi
29
Vidyalankar : T.Y. B.Sc.(IT) DW
k ar
(A)
an
Q.6(c) Explain multidimensional database architecture with suitable diagram.
Multidimensional database architecture with suitable diagram
A multidimensional database is a type of database that is optimized for
[5]
Data Storage
Each data value is stored in a single cell in the database, in the form of
the other dimensions represents a data value.
30
April 14 : Paper Solution
Data Value
The intersection of one member from one dimension with one member
from each of the other dimensions represents a data value.
Fact Table
Fact table consists of the measurements and facts of the business process.
A fact table typically had two types of columns: those that contains facts
(numerical values) and those that are foreign keys to dimension tables.
Dimension Table
ar
The dimension table provides the detailed information about the
attributes in the fact table.
Fact tables do not have direct relationships to one another.
k
Star Schema
In the star schema design, a single object (the fact table) sits in the
middle and is connected to other surrounding objects (dimension tables)
an
like a star.
A star schema has one dimension table for each dimension.
al
dy
Snowflake Schema
Snowflake schema contains several dimension tables for each dimension.
Vi
31
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
performance constraints limit the range of applications for which they
are suited. For these reasons, they are not often used for budgeting or
business and financial applications.
k
MOLAP
Multidimensional OLAP uses a storage mechanism which is optimized for
the pre-calculation, storage and retrieval of multidimensional data.
an
They are best suited for medium sized, static applications which demand
sub-second data retrieval.
Examples are analysis of historical sales and financial information.
However, since their batch pre-calculations can take a long time, they
are not optimum for dynamic applications where a result from new is
al
updated data is required. Their batch pre-calculation approach may also
make them unsuitable for large, very sparse applications with more than
five dimensions as the data explosions can be unmanageable.
dy
RAP
Real-time Analytical Processing deals with all the multidimensional input
values in memory and creates the derived multidimensional values in real
time, on demand.
RAP is best suited for dynamic applications, for environments that
should support a mobile workforce, and for environments that need to
Vi
scale from small desktop systems to very large applications with more
than five dimensions.
The ability to perform calculation in real time such as financial
reporting, budgeting and planning and management in marketing,
operations and sales.
It also avoids the data explosion caused by pre-calculating derived
results. dimensions
32
April 14 : Paper Solution
ar
sometimes used to specify what data is requested in a query. The elements
at the base of the pyramid are all at Level 0. These elements contain the
base level, input data, Level 1 elements are aggregations where all the
children are Level 0. Level 2 elements have at least one Level 1 child. They
k
may also have Level 0 child.
33
T.Y. B.Sc. (IT) Sem. VI
Data Warehousing
Time: 2 ½ Hrs.] Mumbai University Question Paper Solution : Oct. 14 [Marks : 60
ar
(A) Data warehousing is represented as an enterprise-wide framework for
managing informational data within the organisation.
In order to understand how all the components involved in a data
warehousing strategy are related, it is essential to have Data
k
Warehouse Architecture.
as a data source.
Operations - such as sales processing data, HR data, product data,
inventory processing data, marketing data, systems data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey
data.
All these data sources together form the Data Source Layer.
34
Oct. 14 : Paper Solution
ar
volume, and secure data exchange within and between enterprises.
k
Incomplete and inaccurate data jeopardizes the success of the data
warehouse. an
Data warehouse do not generate their own data; rather they rely upon
the input data from the various source systems.
It is very essential to measure the quality of the source data and take
corrective action even before the information is processed and loaded
into the target warehouse.
al
Example :
Sample Input to Name and Address Operator
Address Column Address Component
Name Joe Smith
dy
35
Vidyalankar : T.Y. B.Sc.(IT) DW
Data Profiling : is the first step for any organization to improve information
quality and provide better decisions. Using this method of data analysis,
defects in your data are discovered before you start working with it.
ar
A lot of formatting and cleansing activities happen in this layer.
Data cleansing is the process of detecting and correcting (or removing)
corrupt or inaccurate records from a record set, table, or database.
k
Data Processing Layer
In the data warehouse, the dimensionally modelled data resides.
This layer consists of data staging.
an
The staging layer is where you load, transform, and clean data before
moving it to the data warehouse.
Create staging tables that hold large volumes of fact data and large
dimension tables across multiple database partitions. Although, if data
has to be manipulated after it has been loaded, you might want to define
al
indexes on staging tables depending on the extract, transform, and load
(ETL) tools that you use.
Data staging often involves complex programming, but increasingly
warehousing tools are being created that help in this process.
dy
Staging may also involve data quality analysis programs and filters that
identify patterns within existing operational data.
Since most of the queries and reports are analytical in nature, there has
to be a tight integration between the warehouse dimensional model and
the reporting architecture.
36
Oct. 14 : Paper Solution
k ar
Q.1(c) Differentiate OLTP database and Data Warehouse. [5]
(A)
OLTP OLAP (data Warehouse)
Application
an
Operational: ERP, Management Information
CRM, legacy apps,… System, Decision Support
System
Typical users Staff Managers, Executives
Horizon Weeks, Months Years
al
Refresh Immediate Periodic
Data model Entity-relationship Multi-dimensional
Schema Normalized Star
Emphasis Update Retrieval
dy
Q.1(d) Explain Star Schema Model and Snow Flake Model. [5]
(A) Star Schema Model and Snow Flake Model
The central theme of the dimensional model is the star schema which
consists of a central ‘fact table’, containing measures surrounded by
Vi
37
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
Another version of star scheme is a snowflake schema. In this, the
k
complex dimensions are normalised. Here dimensions maintain
relationships to other levels of the same dimensions.
When representing the snowflake schema, ‘category’ and ‘brand’ are
an
kept as separate entities but are related to ‘product’.
al
dy
Vi
38
Oct. 14 : Paper Solution
k ar
Q.2(b) Why and how is repository and workspaces configured? [5]
(A) Why – repository is configured first and then workspace in the repository to
an
create the objects that are needed for the Oracle Warehouse builder to run.
After this configuration, the user can be connected to the database.
39
Vidyalankar : T.Y. B.Sc.(IT) DW
The Host Name is the name assigned to the computer on which we've
installed the database, and we can just leave it at LOCALHOST.
The Port Number is the one we assigned to the listener back when we
had installed it. It defaults to the standard 1521.
Step 2 : It asks us what option we'd like to perform of the following:
Manage Warehouse Builder workspaces
Manage Warehouse Builder workspace users
Add display languages to repository
Register a Real Application Cluster instance
ar
Step 3 : This step asks us what we'd like to do with workspaces: create a
new workspace or drop an existing one. We'll select the first option to
create a new workspace.
Step 4 : Since we're specifying a new user, we will put in the password for the
k
system user and proceed to the next step. The password used here is the one
we previously defined for the system accounts when we created our database.
workspace name.
an
Step 5 : In this step we specify the new username, password, and
Step 6 : This step will ask for the password for the OWBSYS user.
Step 7 : Specify any workspace users from existing database users.
al
After selecting any user, the Repository Assistant will present us with a
summary screen of the actions it will take and the information we entered,
as shown in the following image:
dy
Vi
40
Oct. 14 : Paper Solution
Project Explorer :
The Projects tab is where we will work on the objects that we are going
ar
to design for our data warehouse. It has nodes for each of the design
objects we’ll be able to create.
So, we will need to design an object under the Databases node to
model that source database. If we expand the Databases node in the
tree, it includes both Oracle and Non-Oracle databases. We are not
k
restricted to interacting with just Oracle in Warehouse Builder, but we
can pull data from a flat file, in which case we would define an object
an
under the Files node.
The Projects tab isn’t just for defining our source data, it also holds
information about targets. So the Projects tab defines both the sources
of our data and the targets,
al
dy
Vi
Connection Explorer :
The Connection tab is where the connections are defined to our
various objects in the Projects tab. The workspace has to know how to
connect to the various databases, files, and applications we may have
defined in our Projects tab.
41
Vidyalankar : T.Y. B.Sc.(IT) DW
k ar
Globals tab :
an
There are some objects that are common to all projects in a workspace.
The Globals Navigator is used to manage these objects. It includes
objects such as Public Transformations or Public Data Rules.
A transformation is a function, procedure, or package defined in the
database in Oracle’s procedural SQL language called PL/SQL. Data rules
al
are rules that can be implemented to enforce certain formats in our data.
dy
Q.2(d) Explain the two steps involved in configuring Oracle to connect to [5]
Vi
42
Oct. 14 : Paper Solution
ar
#
# Environment variables required for the non-Oracle system
#
#set <envvar>=<value>
k
3. The HS_FDS_CONNECT_INFO line is where the ODBC DSN is
specified. So replace the <odbc data_source_ name> string with the
name of the Data Source, which is ACME_POS.
an
4. The HS_FDS_TRACE_LEVEL line is for setting a trace level for the
connection. The trace level determines how much detail gets logged by
the service and it is OK to set the default as 0 (zero).
HS_FDS_CONNECT_INFO = ACME_POS
HS_FDS_TRACE_LEVEL = 0
al
Save the file - initacmepos.ora.
Now we're going to add a SID to our listener.ora file. listener.ora file is
present in ORACLE_HOME\network\ admin. The steps for this are:
1. Load the listener.ora file into a text editor (or Notepad). Add the
following lines to the file:
SID_LIST_LISTENER=
(SID_LIST=
Vi
(SID_DESC=
(SID_NAME=acmepos)
(ORACLE_HOME=D:\app\bob\product\11.1.0\db_1)
(PROGRAM=dg4odbc)
)
)
Save the listener.ora file, restart the listener for the change to take
effect. We can restart it by navigating to Start | Control Panel |
43
Vidyalankar : T.Y. B.Sc.(IT) DW
Administrative Tools and then clicking on Services. Now, scroll down until
you see the service for your database listener, which will be named starting
with Oracle and ending in TNSListener. It will contain ORACLE_ HOME—
OracleOraDb11g_home1TNSListener. Now right-click on it and select
Restart.
ar
A key feature of data warehouse is being able to analyze data from
several time periods and compare results between them.
It is the dimension which provides us the means to retrieve data by time
period.
k
Every dimension has four characteristics that have to be defined in OWB :
1) Levels an 2) Dimension Attributes
3) Level Attributes 4) Hierarchies
1) Levels : The Levels are for defining the levels where aggregations
will occur, or to which data can be summed. There should be least two
levels in our Time dimension. While reporting on data from our data
al
warehouse, users will want to see totals summed up by certain time
periods such as per day, per month, or per year. These become the
levels. The Warehouse Builder has the following Levels available for the
Time dimension when using the Time Dimension Wizard, which we’ll
dy
discuss in a moment:
Day Fiscal week
Calendar week Fiscal month
Calendar month Fiscal quarter
Calendar quarter Fiscal year
Calendar year
Vi
2) Dimension attributes :
The Dimension Attributes are individual pieces of information
stored in the dimension that can be found at more than one level.
Each level will have an ID that identifies that level, a start and an
end date for the time period represented at that level, a time span
that indicates the number of days in the period, and a description of
the level.
44
Oct. 14 : Paper Solution
3) Level Attributes :
Each level has Level Attributes associated with it that provide
descriptive information about the value in that level. The dimension
attributes found at that level and additional attributes specific to
the level are included. For example, if we’re talking about the Month
level, we will find attributes that describe the value for the month
such as the month of the year it represents, or the month in the
calendar quarter. These would be numbers indicating which month of
the year or which month of the quarter it is.
4) Hierarchy :
ar
There should be at least one Hierarchy for every dimension.
A hierarchy is a structure in our dimension that is composed of certain
levels in order; there can be one or more hierarchies in a dimension.
Calendar month, calendar quarter, and calendar year can be a
k
hierarchy.
We could view our data at each of these levels, and the next level up
would simply be a summation of all the lower-level data within that
an
period. A calendar quarter sum would be the sum of all the values in
the calendar month level in that quarter.
Q.3(b) i) Every editor in OWB has an area in which the contents are [5]
displayed graphically. Name and explain the same.
al
ii) Name and explain the window that displays the configuration
information about items in the canvas.
(A) (i) Canvas
Every editor has an area in which the contents are displayed
dy
inside the box. These boxes can be moved around and resized
manually to suit our tastes. There are three tabs available in the
Data Object Editor Canvas: one for Relational, one for Dimensional,
and one for Business Definition.
They are for displaying objects of the corresponding type. When
working with cubes and dimensions, these will be displayed on the
Dimensional tab. When working with the underlying tables, they
would have appeared on the Relational tab. The Business Definitions
are for interfacing with the Oracle Discoverer Business Intelligence
tool to analyze data.
45
Vidyalankar : T.Y. B.Sc.(IT) DW
(ii) Configuration :
The configuration window displays configuration information
(properties) about items on our Canvas. If nothing shows in this
window, just select an object in the Canvas by clicking on it and the
configuration will appear.
It is here that we can change the deployment option for the object
to deploy OLAP metadata if we want a relational implementation to
store the OLAP metadata.
Q.3(c) Explain Name, Storage and Attributes tab in Dimension Details [5]
ar
window in the OWB Editor.
(A) Name: This tab displays the name of the dimension along with some
other information specific to the dimension type we are looking at. In
this case, it’s a Time dimension created by the Time Dimension Wizard
k
and so it displays the range of data in our Time dimension.
Storage: Here we can see what storage option is set for our dimension
object in the database, whether Relational or Multidimensional.
an
Attributes: The attributes tab is where we can see the attributes that
are designed for our dimension. It displays the attributes in a tabular
form allowing us to view and/or edit them, including adding new
attributes or deleting the existing ones.
al
Q.3(d) Explain Name, Storage, Dimensions, Measures and Aggregations tab [5]
in Cube Details window.
(A) Name : It has a name tab like the dimensions to display its name.
Storage : It has a storage tab as per dimensions. However, we see a
dy
different option here under the Relational (ROLAP) option where we can
create bitmap indexes.
Dimensions : Instead of attributes, the cube has a tab for dimensions.
The dimensions referenced by a cube are basically its attributes.
Measures : The next tab is for the measures of the cube. It is for
those values that we are storing in our cube as the facts that we wish to
Vi
track.
Aggregations : Instead of hierarchies, a cube has aggregations. There
are various methods of aggregation that we can select, as seen in the
drop-down box, the most common of which is sum, which is the default.
This is where the default aggregation method referred to earlier can be
changed. There will be no aggregations in a pure relational
implementation, so we will leave this tab set to the defaults and not
bother changing it.
46
Oct. 14 : Paper Solution
Q.4(b) Explain any three source and target operators provided by [5]
Warehouse Builder.
(A) Main operators
Dimension Operator : An operator that represents previously defined
dimensions.
ar
External Table Operator : This operator represents external tables. They
can be used to access data stored in flat files as if they were tables.
Table Operator : This operator represents a table in the database. We
will need to store data in tables in our Oracle Database at some point in
k
the loading of data.
Common Operators an
Constant : Represents a constant value that is needed. It can be used
to load a default value for a field that doesn’t have any input from
another source, for instance.
View Operator : Represents a database view. Source data is frequently
retrieved via a view in the source database that can pull data from
al
multiple sources into a single, easily accessible view.
Sequence Operator : Can be used to represent a database sequence,
which is an automatic generator of sequential unique numbers and is
most often used for populating a primary key field.
dy
Q.4(c) What is the role of Constraints, Attribute Sets and Data Rules tab [5]
in OWB Editor for a table?
(A) Constraints (Keys)
Vi
The next tab after Columns is Constraints where we can enter any one of
the four different types of constraints on our new table. A constraint is a
property that we can set to tell the database to enforce some kind of rule
on the table that limits (or constrains) the values that can be stored in it.
There are four types of constraints:
Check constraint : A constraint on a particular column that indicates
the acceptable values that can be stored in the column.
Foreign key : A constraint on a column that indicates a record must
exist in the referenced table for the value stored in this column. We
47
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
primary key of the referenced table for the record being referenced.
Unique key : A constraint that specifies the column(s) value
combination(s) cannot be duplicated by any other row in the table.
Attribute Sets
k
The next tab is the Attribute Sets tab. An Attribute Set is a way to group
attributes of an object in an order that we can specify when we create an
an
attribute set. It is useful for grouping subsets of an object’s attributes (or
columns) for a later use.
Data Rules
The next tab is Data Rules. A data rule can be specified in the Warehouse
Builder to enforce rules for data values or relationships between tables. It
al
is used for ensuring that only high-quality data is loaded into the warehouse.
New Mapping, specify a name. Select the multiple tables from source
database and those in mapping.
Window.
Use the Table operator from the Component Palette.
This pop up asks us which table we want to include as this table
operator.
48
Oct. 14 : Paper Solution
k ar
an
Define operator properties for the JOINER
Invoke Expression Builder by clicking on the button with the three dots
al
(…) to the right of the blank white box
For example :
dy
Vi
Add the aggragator operator between the joiner operator and the stage
table.
49
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
Q.5 Attempt any TWO of the following : [10]
Q.5(a) What is the role of TRIM( ), UPPER( ), SUBSTR operator and [5]
TO_NUMBER ( ) in ETL mapping?
k
(A) TRIM( ) – this function falls into the Transformation Operator on the
mapping.
an
al
dy
the output attribute represents the result of applying the TRIM operator
to the input string. It looks like the following screenshot:
UPPER( ) - this function also falls into the Transformation Operator on the
mapping which converts the input string into Upper case.
50
Oct. 14 : Paper Solution
ar
SUBSTR( ) - The Transformation operators in OWB include a substr (or
substring) transformation that will extract the specified number of
characters from the source string. The substr transformation takes three
k
parameters — the string we want to extract the substring from, a number
indicating the start position of the substring within the string, and a number
indicating the length of the substring to extract.
an
al
TO_NUMBER( ) – This function converts the expression into number.
This operator needs three parameters, only one of which is absolutely
necessary — the expression we wish to convert to a number. The other two
dy
parameters are optional and include a format string that we can use if we have
a particular format of number we want (such as a decimal point in a certain
place) and a parameter that allows us to set a certain national language format
to default to if it’s different from the language set in the database.
Vi
Q.5(b) How does validation play an important role in the process of [5]
building the Data Warehouse?
(A) Validation is for error checking.
It is about making sure the objects and mapping which are build in data
warehouse builder have no obvious errors in design.
51
Vidyalankar : T.Y. B.Sc.(IT) DW
The Validate... entry has been highlighted. If we click on it, it will perform
the validation of the metadata entered for the object and will present us
with the results in a separate dialog box as shown next:
The window on the right will contain the messages that have resulted from
the validation. Our POS_TRANS_STAGE table has validated successfully.
But if we had any warnings or errors, they would appear in this window.
ar
this one did.
2) The validation completes successfully, but with some non-fatal warnings.
3) The validation fails due to one or more errors having been found.
k
The drop-down menu in the upper left has options for viewing all objects,
just warnings, or just errors.
an
The All Objects option, which is the default, displays all objects that have
been validated, whether or not there were warnings or errors. Select the
object, right-click and then select Validate. All the selected objects will be
validated and the results for all of them will appear in the window on the right.
al
If we select Warnings, only the objects that have warnings will be
displayed, and if we select Errors, only the objects with errors will be
displayed.
dy
Q.5(c) What are the different default operation modes of the mapping? [5]
(A) Default operating mode of the mapping :
The three modes are as follows:
Set-based
Row-based
Row-based (target only)
Vi
52
Oct. 14 : Paper Solution
ar
at a time. This will limit the auditing available for input and operations,
but provides greater auditing of the output to the target. We can select
ROW_BASED_TARGET_ONLY from the drop-down menu to view the
code for the option.
k
Q.5(d) Explain the importance of the seven columns in the Object Details [5]
Window.
(A)
an
The columns displayed in the Object Details window are as follows:
Object : The name of the object.
Design Status : The status of the design of the object in relation to
whether it has been deployed yet or not :
> New : The object has been created in the Design Center, but has
not been deployed yet.
al
> Unchanged : The Object has been created in the Design Center and
deployed previously, and has not been changed since its last
deployment.
> Changed : The Object has been created and deployed, and has
dy
53
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
Object: Double-click on the object name to launch the appropriate
editor on the object.
Deploy Action: Click on the deploy action to change the deploy action for
the next deployment of the object via a drop-down menu. The list of
k
available actions that can be taken will be displayed. Not all the previously
listed actions are available for every object. For instance, upgrade is not
available for some objects and will not be an option for a mapping.
an
The other window in the Control Center Manager is the Control Center Jobs
window. This is where we can monitor the status of any deployments and
executions we've performed.
al
Q.6 Attempt any TWO of the following : [10]
Q.6(a) Why it necessary to maintain the Snapshots of an object? [5]
(A) (Refer Q.6(b) solution of April 14)
dy
54
Oct. 14 : Paper Solution
We now have our STAGE_MAP mapping copied over to our new project.
So let's open that in the mapping editor by double-clicking on it and
investigate the process of synchronizing.
To update the operator in the mapping to include the new column name,
we must perform the task of synchronization, which reconciles the two
and makes any changes to the operator to reflect the underlying table
definition. Doing the synchronization will accomplish both—add the new
column name and synchronize with the table.
To synchronize, we right-click on the header of the table operator in
the mapping and select Synchronize... from the pop-up menu, or click on
ar
the table operator header and select Synchronize... from the main menu
Edit entry. This will pop up the Synchronize dialog box as shown next:
k
an
al
Click on the drop-down menu and select the table.
dy
Vi
55
Vidyalankar : T.Y. B.Sc.(IT) DW
ar
allow multidimensional queries.
The data is retrieved from the relational database into the client tool
by SQL queries. Since SQL was developed as an access language to
relational databases, it is not optimal for multidimensional queries.
k
For instance, SQL can perform more complex calculations across rows
than across columns.
By storing data in relational tables, a single piece of data is stored in
an
one and only one place. This ensures that the database is consistently
maintained and that transaction updates can be performed in a fast and
efficient manner.
Although the fact data is indeed stored in a relational table and can be
accessed using the RDBMS itself.
al
In order to provide the multidimensional views of the data all vendors
that use the relational database require that the data be organized in
the star or snowflake schema. This means that, in practice, the data
must be almost always duplicated.
dy
56
Oct. 14 : Paper Solution
ar
five dimensions as the data explosions can be unmanageable.
One of the design objectives of the multidimensional server is to
provide fast, linear access to the data regardless of the way the data is
being requested.
k
The simpler request is a two dimensional slice of data from the n-
dimensional hypercube.
The objective is to retrieve the data very fast, regardless of the
an
requested dimensions.
In practice, such simple slices are rare: more typically, the requested
data is a compound slice where two or more dimensions are nested as
rows or columns.
That means the goal is to provide linear response time regardless of
al
where the data is being retrieved from in the hypercube.
dy
any cell in the hypercube can be derived from any others, using all
standard business and statistical functions including conditional logic.
57