Sie sind auf Seite 1von 58

T.Y. B.Sc. (IT) : Sem.

VI
Data Warehousing
Mumbai University Question Paper Solutions

CONTENTS

Page
Year of Exams
No.

April 14 1

Oct. 14 34
T.Y. B.Sc. (IT): Sem. VI
Data Warehousing
Time: 2 ½ Hrs.] Mumbai University Question Paper Solution : April 14 [Marks : 60

Q.1 Attempt any TWO of the following : [10]


Q.1(a) What is data warehouse? List and explain the characteristics of [5]
data warehouse.
(A) Data Warehouse : is a decision database system. It is designed to support
the decision makers in the organization to analyze the historical data and to

ar
achieve the strategic level goals.
 Used for Online Analytical Processing (OLAP). This reads the historical
data for the Users for business decisions.
 The Tables and joins are simple since they are de-normalized. This is

k
done to reduce the response time for analytical queries.
 Data – Modeling techniques are used for the Data Warehouse design.
 Optimized for read operations.
an
 High performance for analytical queries.
 Is usually a Database.

Following are the characteristics of data warehouse


1.  Subject Oriented Data    2.  Integrated Data 
al
3.  Time‐Referenced Data    4.  Non‐Volatile Data 
 
1. Subject Oriented : Data warehouses are designed to help you analyze
data. For example, to learn more about your company's sales data, you
can build a warehouse that concentrates on sales. Using this warehouse,
dy

you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter,
sales in this case, makes the data warehouse subject oriented.
 
2. Integrated : Integration is closely related to subject orientation. Data
Vi

warehouses must put data from disparate sources into a consistent


format. They must resolve such problems as naming conflicts and
inconsistencies among units of measure. When they achieve this, they
are said to be integrated.
 
3. Time Variant : In order to discover trends in business, analysts need large
amounts of data. This is very much in contrast to online transaction
processing (OLTP) systems, where performance requirements demand that
historical data be moved to an archive. A data warehouse's focus on change
over time is what is meant by the term time variant.

207/241/e:0315/0315-390/BSc/IT/TY/DW/EQ_Soln/Syllabus 1
Vidyalankar : T.Y. B.Sc.(IT)  DW

4. Non-volatile : Nonvolatile means that, once entered into the


warehouse, data should not change. This is logical because the purpose
of a warehouse is to enable you to analyze what has occurred.

Q.1(b) Explain the additive, semiadditive and non-additive measures with [5]
examples.
(A) Additivity of Facts : A fact is said to be fully additive if it is additive over
every dimension of its dimensionality ; partially additive if additive over at
least one but not all of the dimensions; and non-additive if not additive over
any dimension.

ar
1. Additive measures (Fully additive facts): These are those specific
class of fact measures which can be aggregated across all dimensions
and their hierarchy.
Example : We have sales figures...one may tend to add sales across all

k
quarters to avail the yearly sales..hence this is an example of Additive
measure. (Customerwise sales, yearwise, monthwise, daywise,
productwise, categorywise sales...etc)
an
2. Semi-Additive measures (Semi additive facts): These are those
specific class of fact measures which can be aggregated across some
dimensions but dimensions but not all dimensions.
Example : We have stock levels say 1000 (qty of Item A) on Monday...a
al
sales person sells 200(qty of Item A, so now the stock is 800) on
Tuesday he further sell 300(qty of Item A, now the stock is 500) on
Wednesday...going by basic math On Thursday he should be left with
500(qty of Item A, assuming no inventory has flown in) to obtain current
dy

stock level he cannot aggregate the Stock sales across time dimension
hierarchy...If done he will have inappropriate outcomes.

3. Non-additive measures (Non additive facts): These are those specific


class of fact measures which can be aggregated across any dimension
and their hieracchy.
Vi

Example : Aggregation of percentage or dates is an Ideal example of


non-additive measures.

Q.1(c) What are the various levels of data redundancy in data warehouse? [5]
(A) Data redundancy : There are three levels of redundancies that enterprises
should think about when considering their data warehouse options
 “Virtual” or “point-to-point” data warehouse
 Central data warehouse
 Distributed data warehouse

2
April 14 : Paper Solution

 “Virtual” or “point-to-point” data warehouse :


 This strategy means that end users are allowed to get at operational
databases directly, using whatever tools are enabled to the “data
access network”.
 This approach provides minimum amount of redundant data.
 This approach can also put the largest unplanned query load on
operational systems.
 Virtual warehousing is an initial strategy in organisations where
there is broad but largely undefined need to get at operational data
from a relatively large class of end users and where the frequency

ar
of requests is low.
 V. warehouses often provide a starting point for organisations to
learn what end-users are really looking for.

k
 Central Data Warehouses
 It is a single physical database that contains all data for a specific
functional area, department, division, or enterprise.
an
 A central data warehouse may contain records for any specific
period of time and usually, contains information from multiple
operational systems.
 These warehouses are real. The data stores here is accessible from
any place and must be loaded and maintained on a regular basis.
al
 These warehouses are built around some form of multidimensional
information database server.

 Distributed Data warehouses


dy

 In these warehouses, certain components are distributed across a


number of different physical databases.
 Increasingly large organisations are pushing decision-making down to
lower levels of the organisation and this in turn, pushing the data
needed for decision making down to the LAN or local computer
serving the local decision-maker.
Vi

Q.1(d) Differentiate between operational system and informational system. [5]


(A) There are two fundamental different types of information systems in all
organisations – Operational systems and informational systems.

Operational Systems
 They are the systems that help everyday operations of the enterprise.
 They are the backbone system of any enterprise and include order
entry, inventory, manufacturing, payroll, accounting etc.

3
Vidyalankar : T.Y. B.Sc.(IT)  DW

 Due to their importance to the organisation, operational systems were


always the first part of the enterprise to be computerised.
 Now a days, these operational systems are completely integrated into
the organisation.
 Organisations cannot function without their operational systems and the
data that these systems maintain.

Information systems
 On the other hand, there are other functions that go on within the
enterprise that have to do with planning, forecasting and managing the

ar
organisation. In this current fast paced world, these functions are very
critical for the survival of the organisation.
 Information systems deal with analysing data and making major
decisions about how the enterprise will operate now and in future.
 Functions like marketing planning, engineering planning, and financial

k
analysis also require information systems to support them.
 But these functions are different from the operational ones and the
an
information required is also different.
 These knowledge based functions which help decision makers to plan and
take future decisions are information systems.
 Where operational data needs are normally focussed upon a single area,
information data needs often span a number of different areas and need
large amounts of related operational data.  
al
The following table summarizes main differences between OLPT and OLAP:
OLTP OLAP
Application Operational: ERP, CRM, Management Information System,
dy

legacy apps, … Decision Support System


Typical users Staff Managers, Executives
Horizon Weeks, Months Years
Refresh Immediate Periodic
Data model Entity-relationship Multi-dimensional
Vi

Schema Normalized Star


Emphasis Update Retrievel

Q.2 Attempt any TWO of the following : [10]


Q.2(a) What is Listener? Write a procedure to create a listener. [5]
(A) Listener
 The listener is the utility that runs constantly in the background on the
database server, listening for client connection requests to the
database and handling them.

4
April 14 : Paper Solution

 It can be installed either before or after the creation of a database,


but there is one option during the database creation that requires the
listener to be configured—so we’ll configure it now, before we create
the database.
 Run Net Configuration Assistant to configure the listener. It is available
under the Oracle menu on the Windows Start menu as shown in the
following image:

k ar
an
The welcome screen will offer four tasks that we can perform with this
assistant. Select the first one to configure the listener, as shown here:
al
dy
Vi

The next screen will ask what we want to do with the listener. The four
options are as follows:
 Add
 Reconfigure
 Delete
 Rename

5
Vidyalankar : T.Y. B.Sc.(IT)  DW

 If the Oracle is getting installed for the first time, only the Add option
will be available. The remainder of the options will be grayed out and will
be unavailable for selection. If they are not, then there is a listener
already configured and we can proceed to the next section—creating the
database.
 The next screen will ask us what we want to name the listener. It will
have LISTENER entered by default and that’s a fine name, which states
exactly what it is, so let’s leave it at that and proceed.
 The next screen is the protocol selection screen. It will have TCP
already selected, which is what most installations will require. Leave that

ar
selected and proceed to the next screen to select the port number to
use. The default port number is 1521, which is standard for
communicating with Oracle databases and is the one most familiar to
anyone who has ever worked with an Oracle database.

k
 That is the last step. It will ask us if we want to configure another
listener, answer "no" and finish out the screens by clicking on the Finish
button back on the main screen.
an
Q.2(b) Explain the procedure for defining source metadata manually with [5]
Data Object Editor.
(A) 1. To start building our source tables for the POS transactional SQL
Server database, Launch the OWB Design Center. Expand the
al
ACME_DW_PROJECT node. Search the already created ACME_POS
module for the SQL Server source database under the Databases |
ODBC node so that is where we’ll create the tables. Navigate to the
Databases | ODBC node, and then select the ACME_POS module under
dy

this node. We will create our source tables under the Tables node, so
let’s right-click on this node and select New Table from the pop-up
menu. As no wizard is available for creating a table, we are using the
Data Object Editor to do this.
2. The first screen we’ll be presented with is a small popup asking us to fill
in the name and a description for the new table we’re creating. We’re
Vi

going to create the metadata for the ITEMS table so let’s change the
name to ITEMS and click OK to continue.
3. Upon selecting OK, we are presented with the Table Editor screen on
the right hand side of the main Design Center interface. It’s a clean
slate that we get to fill in, and will look similar to the following
screenshot:

6
April 14 : Paper Solution

k ar
an
al
The following will be the columns, types, and sizes we’ll use for the Items
dy

table based on what we found in the Items source table in the POS
transaction database:
ITEMS_KEY number (22)
ITEM_NAME varchar2 (50)
ITEM_CATEGORY varchar2 (50)
Vi

ITEM_VENDOR number (22)


ITEM_SKU varchar2 (50)
ITEM_BRAND varchar2 (50)
ITEM_LIST_PRICE number (6, 2)
ITEM_DEPT varchar2 (50)

1. We can save our work at this point and close the Table Editor window
now before proceeding.
2. When completed, our column list should look like the following
screenshot:

7
Vidyalankar : T.Y. B.Sc.(IT)  DW

k ar
Same procedure is continues for the remaining tables. Just do the import
using the Import :
POS_TRANSACTIONS an
POS_TRANS_KEY number (22)
SALES_QUANTITY number (22)
SALES_ASSOCIATE number (22)
REGISTER number (22)
ITEM_SOLD number (22)
al
DATE_SOLD date
AMOUNT number (10, 2)

REGISTERS
dy

REGISTERS_KEY number (22)


REGISTER_MANUFACTURER varchar2 (60)
MODEL varchar2 (50)
LOCATION number (22)
SERIAL_NO varchar2 (50)
Vi

STORES
STORES_KEY number (22)
STORE_NAME varchar2 (50)
STORE_ADDRESS1 varchar2 (60)
STORE_ADDRESS2 varchar2 (60)
STORE_CITY varcar2 (50)
STORE_STATE varchar2 (50)
STORE_ZIP varchar2 (50)
REGION_LOCATED_IN number (22)
STORE_NUMBER varchar2 (10)

8
April 14 : Paper Solution

REGIONS
REGIONS_KEY number (22)
REGION_NAME varchar2 (50)
CONTINENT varchar2 (50)
COUNTRY varchar2 (50)

Q.2(c) Write procedure to create new project in OWB. What is [5]


difference between a module and a project?
(A) Creating a project
1. Launch the Design Center.

ar
2. To create a new project, select New… either from the pop-up menu or
from the Design drop-down menu. We can have any number of projects
defined, but can work on only one at a time.

k
Difference between a module and a project
 A project is defined which holds all the work. The Projects tab is where
we will work on the objects that we are going to design for our data
an
warehouse. It was the old Project Explorer window in the previous
Warehouse Builder release. It has nodes for each of the design objects
we’ll be able to create.
 A module is an object in the Design Center that acts as a storage
location for the various definitions and helps us logically group them.
al
There are Files modules that contain file definitions and Databases
modules that contain the database definitions. These Databases modules
are organized as Oracle modules and Non-Oracle modules.
dy

Q.2(d) Draw and explain OWB architecture with suitable diagram. [5]
(A) OWB components and architecture
Oracle Warehouse Builder is composed on the client of the Design Center
(including the Control Center Manager) and the Repository Browser. The
server components are the Control Center Service, the Repository (including
Workspaces), and the Target Schema. A diagram illustrating the various
Vi

components and their interactions follows :

9
Vidyalankar : T.Y. B.Sc.(IT)  DW

Client

ar
The Design Centre
 The Design Center is the main client graphical interface for designing

k
our data warehouse.
 It is used to define our sources and targets, and describe the extract,
transform, and load (ETL) processes we use to load the target from the
an
sources. The ETL procedures are what we will define to carry out the
extraction of the data from our sources, any transformations needed on
it and subsequent loading into the data warehouse.
 What will be created in the Design Center is a logical design only, not a
physical implementation. This logical design will be stored behind the
al
scenes in a Workspace in the Repository on the server. The user
interacts with the Design Center, which stores all its work in a
Repository Workspace.
dy

The Control Center Manager


 The Control Center Manager is used for managing the creation of that
physical implementation by deploying the designs we've created into the
Target Schema.
 The Control Center Manager interacts behind the scenes with the
Vi

Control Center Service, which runs on the server. The user directly
interacts with the Control Center Manager and the Design Center only.

The Target Schema


 The Target Schema is where OWB will deploy the objects to, and where
the execution of the ETL processes that load our data warehouse will
take place.
 It is the actual data warehouse schema that gets built. It contains the
objects that were designed in the Design Center, as well as the ETL
code to load those objects.

10
April 14 : Paper Solution

The Repository
 The Repository is the schema that hosts the design metadata
definitions we create for our sources, targets, and ETL processes.
 We will be defining sources, targets, and ETL processes using the
Design Center and the information about what we have defined (the
metadata) is stored in the Repository.
 The Repository is a Warehouse Builder software component for which a
separate schema is created when the database is installed.
 The Repository will contain one or more Workspaces. A Workspace is
where we will do our work to create the data warehouse.

ar
The repository Client
 One final OWB component to consider is the Repository Browser on the
client.
 It is a web browser interface for retrieving information from the

k
Repository. It will allow us to view the metadata, create reports, and
audit runtime operations.

Q.3
an
It is the only other component besides the Design Center and the Control
Center Manager that the user interacts with directly.

Attempt any TWO of the following : [10]


Q.3(a) Write short note on cube and dimensions. [5]
(A) Cube (Fact table)
al
 Relational implementation is implemented in the database with tables
and foreign keys.
 The term relational means the tables in it relate to each other in some
way.
dy

 This design principle is followed to keep the number of levels of foreign


keys relationship to minimum,
 It is much faster and easier to understand if we don’t have to include
multiple levels of referenced tables.
 For this reason, a data warehouse dimensional design that is
Vi

represented relationally in the database will have one main table to hold
the primary facts/measures, such of count of items sold, or total sale
amount etc.
 The tables that are referenced by main table contain all the information
they need and do not need to go down any more levels to reference any
other tables.
 The ER diagram of this implementation looks like a star, so it is called as
star schema.
 The main table in the middle is referred to as the fact table as it holds
facts or measures and this represents the Cube.

11
Vidyalankar : T.Y. B.Sc.(IT)  DW

Dimensions
 The tables surrounding the fact table are called as Dimensions. They
contain the descriptive information.

ar
Q.3(b) Explain the steps for importing the metadata for a flat file. [5]

k
(A) Right-click on the module name under the Files node under our project, and
select Import and then Flat File…. The following are the steps to be
an
performed in the File Import screens :

1. The first screen for importing a file is shown in the following screenshot:
al
dy
Vi

This is where we will specify the file we wish to import. Click Add
Sample File and select the counties.csv file. After selecting the file
from the resulting popup, it will fill in the filename on the File Import
screen.

12
April 14 : Paper Solution

k ar
2. If the file viewing is done, just click OK to close the dialog. Click the
Import button now on the File Import screen to begin the import
process. This is Flat File Sample Wizard. The Flat File Sample Wizard
an
now has two paths that we can follow through it, a standard sequence
for simple files and an advanced sequence for more complex files. The
two sets of steps are indicated on the Welcome screen as shown below:
al
dy
Vi

13
Vidyalankar : T.Y. B.Sc.(IT)  DW

3. Clicking the Next button will take us to the first step which is shown
below:

k ar
an
This screen displays the information the wizard pulled out of the
file, displayed as columns of information. It knows what’s in the columns
because the file has each column separated by a comma, but doesn’t
know at this point what type of data or column name to use for each
column—so it just displays the data. It picks a name based on the file
al
name, which is fine.
4. Take the advanced path through the wizard which will consist of more
steps, or we can just click the Next button. The simple path is for basic
comma delimited files with single rows separated by a carriage return.
dy

5. Step 2 of the simple steps includes the record and field delimiters
choices as shown next:
Vi

14
April 14 : Paper Solution

Our records are separated by a Comma, and that is the default. The
Enclosures: selection is OWB’s way of specifying the characters that
surround the text values in the file. Frequently, in text-based files such
as CSV files, the text is differentiated from numerical values by
surrounding the text values in double quotes, single quotes, or something
similar. This is where we specify the characters, if any, that surround
the text-field values in this file.

k ar
an
al
6. The final step is where we specify the details about what each field
contains, and give each field a name. Check the box that says Use the
first record as the field names and we’ll see that all the column names
have changed to using the values from that first row.
dy
Vi

15
Vidyalankar : T.Y. B.Sc.(IT)  DW

Notice that the field type for the first column has changed. The ID is
now INTEGER instead of character, as it has now correctly detected
that the remaining rows after that first one all contain integer data.
Length is specified there, which defaults to 0.
7. Click on Next to get a summary screen of what the wizard will do, or
just click on the Finish button to continue. After clicking Finish it will
create our file module under the Files node and we will be able to access
it in the Projects tab.
8. We’ll make sure to select Save All from the Design menu in Design
Center to save the metadata we just entered.

ar
Q.3(c) What is module? Explain source module and target module. [5]
(A) Module : A module is an object in the Design Center that acts as a storage
location for the various definitions and helps us logically group them. There

k
are Files modules that contain file definitions and Databases modules that
contain the database definitions. These Databases modules are organized as
Oracle modules and Non-Oracle modules.
an
Source Module
Creating a target user and module
 A different module is to be created for target objects. A new module is
created in the Projects tab for our target to hold our data warehouse
al
design objects.
 However, before we can do that, we should have a target schema defined in
the database that will hold our target objects when we deploy them.
 The target schema is going to be the main location for the data
dy

warehouse. The target is where the actual data warehouse will be built.
Our design will be implemented there.
 Every target module must be mapped to a target user schema.
 It’s always good to create a separate user schema to become the target
so that user roles in our database can be kept separate.
Vi

Q.3(d) List and explain the functionalities that can be performed by OWB [5]
in order to create data warehouse.
(A) Data Modeling : Most data warehouse designers use a data modeling tool to
create the logical and physical design of the data warehouse. The logical
design ensures that all business requirements, definitions, and rules are
supported. The physical design ensures optimal performance in the planning
of indexes, relationships, data types and properties. To support developers
of OLAP, data mining and reporting systems, the data model also acts as
documentation for the final data warehouse.

16
April 14 : Paper Solution

Extraction, Transformation and Load (ETL) : this is the another


functionality performed by OWB. It is the short form for Extract,
Transform and Load, three database functions that are combined into one
tool tip pull data out of one database and place it into another database.

Data Profiling and Data Quality –Data profiling help to discover value
frequencies, formats and patterns.
Data profiling can be applied to generate statistics about data quality, and
to discover complex patterns, foreign key relationships, and functional
relationships. Using data profiling one can find some perceived defects. But

ar
by this quality cannot be accessed.
By data quality assessment, true assessment of data quality is created.

Metadata management : involves managing data about other data, whereby this

k
“other data” is generally referred to as content data. Metadata management
provides a number of very important benefits to the enterprise :
  Consistency of definitions
an
  Clarity of relationships
  Clarity of data lineage

Integration with oracle business intelligence tool for reporting purposes :


OWB can implement Oracle business intelligence applications and data mart.
al
Q.4 Attempt any TWO of the following : [10]
Q.4(a) What is staging area? What are advantages and disadvantages of [5]
Staging?
dy

(A) Staging : Staging is the process of copying the source data temporarily into a
table(s) in the target database. Here one can perform any transformations
that are required before loading the source data into the final target tables.

Staging area
• Staging area is the area used while designing the ETL.
Vi

• It can be created in the database or outside the database.


• Outside the database it is created in flat files on the file system that
can be accessed to load data into database.
• So the staging area would be a folder on the file system and the data
would be stores in a flat file.

Advantages
 If the source data is in another database than an Oracle Database, the
reliability of the connection to the database and the performance of the
link while pulling data across is to be taken into account.

17
Vidyalankar : T.Y. B.Sc.(IT)  DW

 If a failure occurs during an intermediate step of the ETL process, the


process has to be restarted. If such failure occurs, we will have to
consider the severity of the impact, as in the following cases:
 Going back again to the source system to pull data if the first
attempt failed.
 The source data is changing while we are trying to load it into the
warehouse, meaning that whatever data we pull the second time
might be different from what we started with. This condition will
make it difficult to debug the error that caused this failure.
 Staging table in Oracle makes the ETL process very fact and the

ar
transformations can then be run on it without impacting the
transactional system.
 The individual process to stage the data to a table in the oracle
database simply involves copying the data one-for-one over to the

k
Oracle database, and this runs in less than 30 seconds. This means the
source database connection is only open for 30 seconds, whereas it had
to constantly work for hours without a staging table.
an
 Another advantages is that if the ETL process needs to be restarted,
there is no need to go back to disturb the source system to retrieve the
data.

Q.4(b) List and explain the use of various windows available in mapping [5]
al
editor.
(A)  Mapping : The Mapping window is the main working area in the center of
the above image where mapping is designed. This window is also referred
to as the canvas. This is the graphical display that will show the
dy

operators being used and the connections between the operators that
indicate the data flow from source to target.

 Explorer :
It has two tabs – Available Objects tab and Selected Object Tab
Available objects – it displays objects defined in our project elsewhere
Vi

and they can be dragged and dropped into this maping.


Selected objects – displays all the objects currently defined in the
mapping. When an object is selected in the canvas, the Selected Object
window scrolls to that object and highlights it.

 Structure View : It provides a hierarchical view of the objects in the


editor, including operators and attributes of those operators for a
mapping. It will display the structure for a data object also if opened
when editing a data object.

18
April 14 : Paper Solution

 Property Inspector (Mapping Properties)


The Property Inspector window displays the various properties that
can be set for objects in our mapping. When an object is selected in the
canvas, its properties will display in this window.

k ar
 Component Palette : The Component Palette contains each of the
an
objects that can be used in our mapping. We can click on the object we
want to place in the mapping and drag it onto the canvas.
 Bird’s Eye View : This window displays a miniature version of the entire
canvas and allows us to scroll around the canvas without using the scroll
bars.
al
Q.4(c) Explain the various OWB operators. [5]
(A) Following are the types of OWB operators :
 Source and Target Operators
dy

 Transformations (Data Flow Operators)


 Pre/Post Processing Operators.
The following screenshots display the complete list of operators in each of
the three categories:
Vi

Source and target operators


Main operators
 Dimension Operator: An operator that represents previously defined
dimensions.
 External Table Operator: This operator represents external tables.
They can be used to access data stored in flat files as if they were
tables.
 Table Operator: This operator represents a table in the database. We
will need to store data in tables in our Oracle Database at some point in
the loading of data.

19
Vidyalankar : T.Y. B.Sc.(IT)  DW

Common operators
 Constant: Represents a constant value that is needed. It can be used to
load a default value for a field that doesn’t have any input from another
source, for instance.
 View Operator: Represents a database view. Source data is frequently
retrieved via a view in the source database that can pull data from
multiple sources into a single, easily accessible view.
 Sequence Operator: Can be used to represent a database sequence,
which is an automatic generator of sequential unique numbers and is
most often used for populating a primary key field.

ar
 Construct Object: This operator can be used to actually construct an
Oracle object type in our mapping.

Transformations (data flow operators)

k
The true power of a data warehouse lies in the restructuring of the source
data into a format that greatly facilitates the querying of large amounts of
data over different time periods. For this, we need to transform the source
an
data into a new structure. That is the purpose of the transformation (or
data flow) operators.

Some of the common data flow operators we’ll see are as follows:
 Aggregator: There are times when source data is at a finer level of
al
detail than we need. So we need to sum the data up to a higher level, or
apply some other aggregation type function such as an average function.
This is the purpose of the Aggregator operator.
 Deduplicator: Sometimes our data records will contain duplicate
dy

combinations that we want to weed out so we’re loading only unique


combinations of data. The Deduplicator operator will do this for us.
 Expression: This represents an SQL expression that can be applied to
the output to produce the desired result.
 Filter: This will limit the rows from an output set to criteria that we
specify. It is generally implemented in a where clause in SQL to restrict
Vi

the rows that are returned.


 Joiner: This operator will implement an SQL join on two or more input
sets of data. A join takes records from one source and combines them
with the records from another source using some combination of values
that are common between the two.
 Lookup: A Lookup operator (previously known as a Key Lookup) looks up
data in a table based on some input criteria (the key) to return some
information required by our mapping.

20
April 14 : Paper Solution

 Pivot: This operator can be useful if we have source records that


contain multiple columns of data that is spread across columns instead
of rows.
 Set Operation: This operator will allow us to perform an SQL set
operation on our data such as a union (returning all rows from each of
two sources, either ignoring the duplicates or including the duplicates)
or intersect (which will return common rows from two sources).
 Splitter: This operator is the opposite of the Joiner operator. It will
allow us to split an input stream of data rows into two separate targets
based on the criteria we specify.

ar
 Transformation Operator: All these operators are transformation
operators but there is one operator type specifically named
"Transformation". This operator can be used to invoke a PL/SQL
function or procedure with some of our source data as input to provide a

k
transformation of data.
 Table Function Operator: A Table Function Operator can be seen in the
date_dim_map map. There are three Table Function operators defined:
an
This kind of operator represents a Table Function, which is defined in
PL/SQL and is a function that can be queried like a table to return rows
of information.

Pre/Post-Processing Operators
al
There is a small group of operators that allow us to perform operations
before the mapping process begins, or after the mapping process ends.
These are the pre- and post-processing operators and mapping input and
output operators. We can perform functions or procedures before or after
dy

a mapping runs, and can also accept input or provide output from a mapping
process.
 Mapping Input Parameter: This operator allows us to pass a
parameter(s) into a mapping process.
 Mapping Output Parameter: As the name suggests, this is similar to the
Mapping Input Parameter operator, but provides a value as output from
Vi

our mapping.
 Post-Mapping Process: Allows us to invoke a function or procedure after
the mapping completes its processing.
 Pre-Mapping Process: It allows us to invoke a function or procedure
before the mapping process begins.

21
Vidyalankar : T.Y. B.Sc.(IT)  DW

Q.4(d) Write the steps for building staging area table using Data Object [5]
Editor.
(A) Launch the OWB Design Center. Expand the project node. For this, the
module should be created first, then the target use and the target module
of the same name.

The steps to create the staging area table in our target database are:
1. Navigate to the Databases | Oracle | ACME_DWH module. Right-click
on the Table node and select New Table from the pop-up menu.
2. Upon selecting New Table, enter the name of the new table and an

ar
optional description.

k
an
al
dy

3. The first tab is the Name tab where it displays the name we just gave it
in the opening popup.
4. Click on the Columns tab and enter the information that describes the
columns of our new table.

The following will then be the column names, types, and sizes we’ll use
Vi

for our staging table based on what we found in the source tables in the
POS transaction database:
SALE_QUANTITY NUMBER (0, 0)
SALE_DOLLAR_AMOUNT NUMBER (10, 2)
SALE_DATE DATE
PRODUCT_NAME VARCHAR2 (50)
PRODUCT_SKU VARCHAR2 (50)
PRODUCT_CATEGORY VARCHAR2 (50)
PRODUCT_BRAND VARCHAR2 (50)

22
April 14 : Paper Solution

PRODUCT_PRICE NUMBER (6, 2)


PRODUCT_DEPARTMENT VARCHAR2 (50)
STORE_NAME VARCHAR2 (50)
STORE_NUMBER VARCHAR2 (10)
STORE_ADDRESS1 VARCHAR2 (60)
STORE_ADDRESS2 VARCHAR2 (60)
STORE_CITY VARCHAR2 (50)
STORE_STATE VARCHAR2 (50)
STORE_ZIPPOSTALCODE VARCHAR2 (50)
STORE_REGION VARCHAR2 (50)

ar
STORE_COUNTRY VARCHAR2 (50)

When completed, our column list should look like the following
screenshot:

k
an
al
5. Save work using the Ctrl+S keys, or from the File | Save All main menu
dy

entry in the Design Center.

The other tabs in Table Editor are:


 Constraints (Keys) : The next tab after Columns is Keys where we can
enter any one of the four different types of constraints on our new
Vi

table. A constraint is a property that we can set to tell the database to


enforce some kind of rule on the table that limits (or constrains) the
values that can be stored in it. There are four types of constraints:
 Check constraint: A constraint on a particular column that indicates
the acceptable values that can be stored in the column.
 Foreign key: A constraint on a column that indicates a record must
exist in the referenced table for the value stored in this column. A
foreign key is also considered a constraint because it limits the
values that can be stored in the column that is designated as a
foreign key column.

23
Vidyalankar : T.Y. B.Sc.(IT)  DW

 Primary key: A constraint that indicates the column(s) that make up


the unique information that identifies one and only one record in the
table. It is similar to a unique key constraint in which values must be
unique. The primary key differs from the unique key as other tables’
foreign key columns use the primary key value (or values) to
reference this table. The value stored in the foreign key of a table
is the value of the primary key of the referenced table for the
record being referenced.
 Unique key: A constraint that specifies the column(s) value
combination(s) cannot be duplicated by any other row in the table.

ar
  Indexes : The next tab provided in the Table Editor is the Indexes tab.
An index can greatly facilitate rapid access to a particular record.
 Partitions : A partition is a way of breaking down the data stored in a
table into subsets that are stored separately. This can greatly speed up

k
data access for retrieving random records, as the database will know
the partition that contains the record being searched for based on the
partitioning scheme used.
an
 Attribute Sets :The next tab is the Attribute Sets tab. An Attribute
Set is a way to group attributes of an object in an order that we can
specify when we create an attribute set. It is useful for grouping
subsets of an object’s attributes (or columns) for a later use.
 Data Rules : The next tab is Data Rules. A data rule can be specified in
al
the Warehouse Builder to enforce rules for data values or relationships
between tables. It is used for ensuring that only high-quality data is
loaded into the warehouse.
dy

Q.5 Attempt any TWO of the following : [10]


Q.5(a) Write the steps to add primary key for a columns of a table in [5]
Data Object Editor with suitable example?
(A) To add a primary key, we’ll perform the following steps:
1. In the Design Centre, open the table in the Table Editor by double-
Vi

clicking on it under the Tables node.


2. Click on the Keys tab.
3. Click on the Add Constraint button.
4. Type the name of the column in the Name column.
5. In the Type column, click on the drop-down menu and select Primary Key.
6. Click on the Local Columns column, and then click on the Add Local
Column button.
7. Click on the drop-down menu that appears and select the column which
you want to make as a primary key.
8. Close the Table Editor window.

24
April 14 : Paper Solution

Example :
1. In the Design Centre, open the COUNTIES_LOOKUP table in the Table
Editor by double-clicking on it under the Tables node.
2. Click on the Keys tab.
3. Click on the Add Constraint button.
4. Type PK_COUNTIES_LOOKUP (or any other naming convention we
might choose) in the Name column.
5. In the Type column, click on the drop-down menu and select Primary Key.
6. Click on the Local Columns column, and then click on the Add Local
Column button.

ar
7. Click on the drop-down menu that appears and select the ID column.
8. Close the Table Editor window.

Q.5(b) Write a short note on Control Center Manager. [5]

k
(A) Control Center Manager
 The Control Center Manager is the interface the Warehouse Builder
provides for interacting with the target schema.
an
 This is where the deployment of objects and subsequent execution of
generated code takes place.
 The Design Center is for manipulating metadata only on the repository.
Deployment and execution take place in the target schema through the
Control Center Service.
al
 The Control Center Manager is our interface into the process where we
can deploy objects and mappings, check on the status of previous
deployments, and execute the generated code in the target schema. ‘
 Control Center Manager is launched from the Tools menu of the Design
dy

Center main menu. Click on the very first menu entry, which says Control
Center Manager. This will open up a new window to run the Control
Center Manager, which will look similar to the following:
Vi

25
Vidyalankar : T.Y. B.Sc.(IT)  DW

Q.5(c) Write the steps for validating and generating in Data Object Editor. [5]
(A) Steps for validating in the Data Object Editor
1. Double-click on the POS_TRANS_STAGE table name in the Design
Center to launch the Data Object Editor.
2. Right-click on the object displayed on the Canvas and select Validate
from the pop-up menu, or we can select Validate from the Object menu
on the main editor menu bar.
3. Another option is available if we want to validate every object currently
loaded into our Data Object Editor. It is to select Validate All from the
Diagram menu entry on the main editor menu bar.

ar
4. We can also press the validate icon on the General toolbar, which is
circled in the following image of the toolbar icons:

k
5. When we validate an object in the editor, we do not get the Validation
an
Results pop-up dialog box.
6. Here we get another window created in the editor, the Generation
window, which appears below the Canvas window. The window that is
produced will look similar to the following:
al
dy

In many cases, the error message will be long and the window will display the
message truncated in the window.
Vi

When we validate from the Data Object Editor, it is on an object-by-object


basis for objects appearing in the editor canvas. But when we validate a
mapping in the Mapping editor, the mapping as a whole is validated all at once.

The steps for generating in Data object Editor.


Generating in the Data Object Editor
 Start the Data Object Editor and open POS_TRANS_STAGE table in
the editor by double-clicking on it in the Design Center.
 To review the options we have for generating, there is the Generate...
menu entry under the Object main menu, the Generate entry on the pop-

26
April 14 : Paper Solution

up menu when we right-click on an object, and a Generate icon on the


general toolbar right next to the Validate icon as shown in the following
image:

We'll use one of these methods to generate the POS_TRANS_STAGE


table. The results will appear in the following Generation window with
the Script tab selected:

k ar
an
al
The window also provides us a Validation Messages tab, which will
display any messages generated as a result of validation.

Q.5(d) Write a short note on ETL transformation. [5]


dy

(A) In Computing, Extract, Transform and Load refers to a process in database


usage and especially in data warehousing that :
 Extracts data from homogenous or heterogeneous data sources.
 Transforms the data for storing it in proper format or structure for
querying and analyzing purpose.
Vi

 Loads it into the final target (database, data warehouse).

Extract
- It involves extracting the data from the source system(s). most
warehousing projects consolidate data from different source systems.
- Each separate system may also use a different data organization and/or
format. Common data-source formats include relational databases, XML
and flat files.
- The extraction phase aims to convert the data into a single format
appropriate for transformation processing.

27
Vidyalankar : T.Y. B.Sc.(IT)  DW

Transform
- The data transformation stage applies a series of rules of functions to
the extracted data from the source to derive the data for loading into
the end target.
- an important function of data transformation is cleansing of data that
aim to pass only proper data to the target.
- One or more of the following transformation types may be required to
meet the business needs
 Selecting only certain columns to load
 translating coded values

ar
 Encoding free-form values
 deriving a new calculated value.
 Sorting
 Joining

k
 Aggregation
 
Load an
 This phase loads the data into the end target that may be a simple flat
file or a data warehouse.

Q.6 Attempt any TWO of the following : [10]


Q.6(a) Explain Multi-Dimensional Online Analytical Processing (MOLAP). [5]
al
(A) Multi-Dimensional Online Analytical Processing (MOLAP)
 Multidimensional OLAP uses a storage mechanism which is optimized for
the pre-calculation, storage and retrieval of multidimensional data.
 They are best suited for medium sized, static applications which demand
dy

sub-second data retrieval.


 Examples are analysis of historical sales and financial information.
However, since their batch pre-calculations can take a long time, they
are not optimum for dynamic applications where a result from new is
updated data is required. Their batch pre-calculation approach may also
make them unsuitable for large, very sparse applications with more than
Vi

five dimensions as the data explosions can be unmanageable.


 One of the design objectives of the multidimensional server is to
provide fast, linear access to the data regardless of the way the data is
being requested.
 The simpler request is a two dimensional slice of data from the n-
dimensional hypercube.
 The objective is to retrieve the data very fast, regardless of the
requested dimensions.

28
April 14 : Paper Solution

 In practice, such simple slices are rare: more typically, the requested
data is a compound slice where two or more dimensions are nested as
rows or columns.
 That means the goal is to provide linear response time regardless of
where the data is being retrieved from in the hypercube.

k ar
 The design goal should be to offer a complete algebraic ability where
any cell in the hypercube can be derived from any others, using all
standard business and statistical functions including conditional logic.
an
Q.6(b) Write a short notes on : [5]
(i) Metadata Snapshots (ii) The Import Metadata Wizard
(A) Metadata Snapshots : A snapshot captures all the metadata information
about an object at the time the snapshot is taken and stores it for later
al
retrieval.
 It is a way to save a version of an object should we need to go back to a
previous version or compare a current version with a previous one.
 To take a snapshot of an object from the Design Center, right-click on
dy

the object and select the Snapshot menu entry. This will give three
options to choose from as shown next:
 We can create a new snapshot, add this object to an existing snapshot,
or compare this object with an already saved snapshot.
 There are two types of snapshots we can take: a full snapshot that
Vi

captures all metadata and can be restored completely (suitable for


making backups of objects) and a signature snapshot that only captures
the signature or characteristics of an object just enough to be able to
detect changes in an object.
 The reason for taking the snapshot will generally dictate which snapshot
is more appropriate.
 Full snapshots can be converted to signature snapshots later if
needed, and can also be exported like regular workspace objects.

29
Vidyalankar : T.Y. B.Sc.(IT)  DW

k ar
(A)
an
Q.6(c) Explain multidimensional database architecture with suitable diagram.
Multidimensional database architecture with suitable diagram
 A multidimensional database is a type of database that is optimized for
[5]

data warehouse and online analytical processing (OLAP) applications.


 Multidimensional data-base technology is a key factor in the interactive
analysis of large amounts of data for decision-making purposes.
al
Architecture
Cubes
 Data cubes provide true multidimensionality. They generalize
dy

spreadsheets to any number of dimensions.


 Although the cube implies 3 dimensions, a cube can have any number of
dimensions.
 A collection of related cubes is commonly referred to as a
multidimensional database.
Vi

Dimensions and Members


 Dimensions provide filtering of the data.
 Members are the individual components of a dimension. For ex. Product
A, Product B and Product C might be members of the Product Dimension.
Each member has a unique name.

Data Storage
 Each data value is stored in a single cell in the database, in the form of
the other dimensions represents a data value.

30
April 14 : Paper Solution

Data Value
 The intersection of one member from one dimension with one member
from each of the other dimensions represents a data value.

Fact Table
 Fact table consists of the measurements and facts of the business process.
 A fact table typically had two types of columns: those that contains facts
(numerical values) and those that are foreign keys to dimension tables.

Dimension Table

ar
 The dimension table provides the detailed information about the
attributes in the fact table.
 Fact tables do not have direct relationships to one another.

k
Star Schema
 In the star schema design, a single object (the fact table) sits in the
middle and is connected to other surrounding objects (dimension tables)
an
like a star.
 A star schema has one dimension table for each dimension.
al
dy

Snowflake Schema
 Snowflake schema contains several dimension tables for each dimension.
Vi

31
Vidyalankar : T.Y. B.Sc.(IT)  DW

Q.6(d) Explain OLAP Terminologies. [5]


(A) OLAP terminologies
ROLAP
 Relational OLAP uses a standard relational database to store the
physical data.
 They are best suited for large, transaction intensive applications such as
high volume retail sales analysis.
 Their advantages are the ability to handle extremely large data sets and
having the same technology as existing RDBMS based system.
 However, their complexity, storage vs. calculation orientation, cost and

ar
performance constraints limit the range of applications for which they
are suited. For these reasons, they are not often used for budgeting or
business and financial applications.

k
MOLAP
 Multidimensional OLAP uses a storage mechanism which is optimized for
the pre-calculation, storage and retrieval of multidimensional data.
an
 They are best suited for medium sized, static applications which demand
sub-second data retrieval.
 Examples are analysis of historical sales and financial information.
However, since their batch pre-calculations can take a long time, they
are not optimum for dynamic applications where a result from new is
al
updated data is required. Their batch pre-calculation approach may also
make them unsuitable for large, very sparse applications with more than
five dimensions as the data explosions can be unmanageable.
dy

RAP
 Real-time Analytical Processing deals with all the multidimensional input
values in memory and creates the derived multidimensional values in real
time, on demand.
 RAP is best suited for dynamic applications, for environments that
should support a mobile workforce, and for environments that need to
Vi

scale from small desktop systems to very large applications with more
than five dimensions.
 The ability to perform calculation in real time such as financial
reporting, budgeting and planning and management in marketing,
operations and sales.
 It also avoids the data explosion caused by pre-calculating derived
results. dimensions

32
April 14 : Paper Solution

Hypercube : refers to a collection of multidimensional data. The edges of


the cube are called as dimensions and the individual item within each
dimension is called as element. Thus the hypercube called ‘GL’ might contain
data from the general ledger. It might be comprised of four dimensions :
accounts, cost centers, months and versions. The accounts dimension might
contain base level elements such as sales, cost of sales, receivables,
payables and calculated elements such as net income, gross margin
percentages and the like.

Level : refers to the position of an element in the dimension hierarchy. It is

ar
sometimes used to specify what data is requested in a query. The elements
at the base of the pyramid are all at Level 0. These elements contain the
base level, input data, Level 1 elements are aggregations where all the
children are Level 0. Level 2 elements have at least one Level 1 child. They

k
may also have Level 0 child.

Density : is the number if actually populated cells as a fraction of the


an
theoretical volume of the table. Thus, a table which is 10% dense and has a
theoretical table size of five million cells would contain 500,000 non-zero
values. Sparcity is the inverse of the density. Thus, a table which is 10%
dense would be 90% sparse.
al

dy
Vi

33
T.Y. B.Sc. (IT) Sem. VI
Data Warehousing
Time: 2 ½ Hrs.] Mumbai University Question Paper Solution : Oct. 14 [Marks : 60

Q.1 Attempt any TWO of the following : [10]


Q.1(a) What is Data Warehouse? Explain. [5]
(A) (Refer Q.1(a) solution of April 14)

Q.1(b) Explain Data Warehouse Architecture. [5]

ar
(A)    Data warehousing is represented as an enterprise-wide framework for
managing informational data within the organisation.
 In order to understand how all the components involved in a data
warehousing strategy are related, it is essential to have Data

k
Warehouse Architecture.

Data Warehouse Architecture


an
 DWA is a way of representing the overall structure of data,
communication, processing and presentation that exists for end user
computing within the enterprise.
 The architecture is made up of a number of interconnected parts:
 Data Source layer
 Source data transport layer
al
 Data quality control and data profiling layer
 Data integration layer
 Data processing layer
dy

 End user reporting layer

Data Source Layer


 This represents the different data sources that feed data into the data
warehouse. The data source can be of any format -- plain text file,
relational database, other types of database, Excel file, etc., can all act
Vi

as a data source.
 Operations - such as sales processing data, HR data, product data,
inventory processing data, marketing data, systems data.
 Internal market research data.
 Third-party data, such as census data, demographics data, or survey
data.
 All these data sources together form the Data Source Layer.

34
Oct. 14 : Paper Solution

Source Data Transport Layer


 This layer largely constitutes data trafficking.
 It represents the tools and processes involved in transporting data from
the source systems to the enterprise warehousing system.
 Since the data volume is huge, the interfaces with the source system
have to be scalable enough to manage secured data transmission.
 Traditionally ftp has been used extensively to do the data transmission.
Various other tools like NDM transmission(Network Data Mover), it is
also referred to as connectDirect.
 CONNECT:Direct is a middleware optimized for assured delivery, high-

ar
volume, and secure data exchange within and between enterprises.

Data Quality Control and Data Profiling Layer


 Data quality control - help.
 Data quality causes the most concern in any data warehousing solution.

k
 Incomplete and inaccurate data jeopardizes the success of the data
warehouse. an
 Data warehouse do not generate their own data; rather they rely upon
the input data from the various source systems.
 It is very essential to measure the quality of the source data and take
corrective action even before the information is processed and loaded
into the target warehouse.
al
Example :
Sample Input to Name and Address Operator
Address Column Address Component
Name Joe Smith
dy

Street Address 8500 Normandale Lake Suite 710


City Bloomington
ZIP Code 55437

Sample Output from Name and Address Operator


Address Column Address Component
Vi

First Name Standardized JOSEPH


Last Name SMITH
Primary Address 8500 NORMANDALE Lake BLVD
Secondary Address STE 710
City BLOOMINGTON
State MN
Postal Code 55437-3813
Latitude 44.849194
Longitude - 093.356352

35
Vidyalankar : T.Y. B.Sc.(IT)  DW

Data Profiling : is the first step for any organization to improve information
quality and provide better decisions. Using this method of data analysis,
defects in your data are discovered before you start working with it.

Data Integration Layer


 The Integration Layer marks the transition from raw data to integrated
data; that is, data that has been consolidated, duplication of records
and values removed, and disparate sources combined into a single
version. This layer represents the passage of the data through the
process of integration, rather than the storage area for the data.

ar
 A lot of formatting and cleansing activities happen in this layer.
 Data cleansing is the process of detecting and correcting (or removing)
corrupt or inaccurate records from a record set, table, or database.

k
Data Processing Layer
 In the data warehouse, the dimensionally modelled data resides.
 This layer consists of data staging.
an
 The staging layer is where you load, transform, and clean data before
moving it to the data warehouse.
 Create staging tables that hold large volumes of fact data and large
dimension tables across multiple database partitions. Although, if data
has to be manipulated after it has been loaded, you might want to define
al
indexes on staging tables depending on the extract, transform, and load
(ETL) tools that you use.
 Data staging often involves complex programming, but increasingly
warehousing tools are being created that help in this process.
dy

 Staging may also involve data quality analysis programs and filters that
identify patterns within existing operational data.

End User reporting Layer


 Success of the Data warehouse largely depends upon ease of access to
valuable information.
Vi

 Since most of the queries and reports are analytical in nature, there has
to be a tight integration between the warehouse dimensional model and
the reporting architecture.

36
Oct. 14 : Paper Solution

k ar
Q.1(c) Differentiate OLTP database and Data Warehouse. [5]
(A)
OLTP OLAP (data Warehouse)
Application
an
Operational: ERP, Management Information
CRM, legacy apps,… System, Decision Support
System
Typical users Staff Managers, Executives
Horizon Weeks, Months Years
al
Refresh Immediate Periodic
Data model Entity-relationship Multi-dimensional
Schema Normalized Star
Emphasis Update Retrieval
dy

Q.1(d) Explain Star Schema Model and Snow Flake Model. [5]
(A) Star Schema Model and Snow Flake Model
 The central theme of the dimensional model is the star schema which
consists of a central ‘fact table’, containing measures surrounded by
Vi

descriptors called as ‘dimensions’.


 In a star schema, a dimension is complex and contains relationships such
as hierarchies; it is flattened to a single dimension.

37
Vidyalankar : T.Y. B.Sc.(IT)  DW

ar
 Another version of star scheme is a snowflake schema. In this, the

k
complex dimensions are normalised. Here dimensions maintain
relationships to other levels of the same dimensions.
 When representing the snowflake schema, ‘category’ and ‘brand’ are
an
kept as separate entities but are related to ‘product’.
al
dy
Vi

Q.2 Attempt any Two of the following : [10]


Q.2(a) What is the role of a listener when using the Oracle database? Is [5]
it necessary to install the listener before the creation of the
database?
(A)  The listener is the utility that runs constantly in the background on the
database server, listening for client connection requests to the
database and handling them.

38
Oct. 14 : Paper Solution

 It can be installed either before or after the creation of a database.


 Run Net Configuration Assistant to configure the listener. It is available
under the Oracle menu on the Windows Start menu as shown in the
following image:

k ar
Q.2(b) Why and how is repository and workspaces configured? [5]
(A) Why – repository is configured first and then workspace in the repository to
an
create the objects that are needed for the Oracle Warehouse builder to run.
After this configuration, the user can be connected to the database.

Use the Repository Assistant application to configure the repository,


create a workspace, and create the objects in the repository that are
needed for OWB to run. This application is available from the Start Menu
al
under the Warehouse Builder | Administration submenu of the Oracle
program group as shown here :
dy
Vi

The steps for configuration are as follows:


Step 1 : Launch the Repository Assistant application on the server and it
asks for the database connection information—Host Name, Port Number,
and Oracle Service Name—or a Net Service Name for a SQL*Net
connection.

39
Vidyalankar : T.Y. B.Sc.(IT)  DW

 The Host Name is the name assigned to the computer on which we've
installed the database, and we can just leave it at LOCALHOST.
 The Port Number is the one we assigned to the listener back when we
had installed it. It defaults to the standard 1521.
Step 2 : It asks us what option we'd like to perform of the following:
 Manage Warehouse Builder workspaces
 Manage Warehouse Builder workspace users
 Add display languages to repository
 Register a Real Application Cluster instance

ar
Step 3 : This step asks us what we'd like to do with workspaces: create a
new workspace or drop an existing one. We'll select the first option to
create a new workspace.
Step 4 : Since we're specifying a new user, we will put in the password for the

k
system user and proceed to the next step. The password used here is the one
we previously defined for the system accounts when we created our database. 
 

workspace name.
an
Step 5 : In this step we specify the new username, password, and

Step 6 : This step will ask for the password for the OWBSYS user.
Step 7 : Specify any workspace users from existing database users.
al
After selecting any user, the Repository Assistant will present us with a
summary screen of the actions it will take and the information we entered,
as shown in the following image:
dy
Vi

40
Oct. 14 : Paper Solution

Q.2(c) What is defined in the Project Explorer Window, Connection [5]


Explorer Window and Global Explorer Window in the Design Center
screen?
(A) Design centre contains the following three windows :
 Project explorer
 Locations (connection explorer)
 Globals Navigator

Project Explorer :
 The Projects tab is where we will work on the objects that we are going

ar
to design for our data warehouse. It has nodes for each of the design
objects we’ll be able to create.
 So, we will need to design an object under the Databases node to
model that source database. If we expand the Databases node in the
tree, it includes both Oracle and Non-Oracle databases. We are not

k
restricted to interacting with just Oracle in Warehouse Builder, but we
can pull data from a flat file, in which case we would define an object
an
under the Files node.
 The Projects tab isn’t just for defining our source data, it also holds
information about targets. So the Projects tab defines both the sources
of our data and the targets,
al
dy
Vi

Connection Explorer :
 The Connection tab is where the connections are defined to our
various objects in the Projects tab. The workspace has to know how to
connect to the various databases, files, and applications we may have
defined in our Projects tab.

41
Vidyalankar : T.Y. B.Sc.(IT)  DW

 As we begin creating modules in the Projects tab, it will ask for


connection information and this information will be stored and be
accessible from the Locations tab. Connection information can also be
created explicitly from within the Locations tab.
 Multiple projects can be defined in the Projects tab, but connection
information is not displayed project-wise in the Locations tab.
Connections are applicable for the entire workspace, and not just the
project we are working on.

k ar
Globals tab :
an
 There are some objects that are common to all projects in a workspace.
 The Globals Navigator is used to manage these objects. It includes
objects such as Public Transformations or Public Data Rules.
 A transformation is a function, procedure, or package defined in the
database in Oracle’s procedural SQL language called PL/SQL. Data rules
al
are rules that can be implemented to enforce certain formats in our data.
dy

Q.2(d) Explain the two steps involved in configuring Oracle to connect to [5]
Vi

SQL Server database.


(A) The following are the two steps involved :
1. Create a heterogeneous service configuration file.
2. Edit the listener.ora file.

Creating a Heterogeneous Service Configuration File


1. Open Windows Explorer and navigate to this folder.
There is a sample init file that Oracle has been kind enough to supply us
with. We can easily modify this file to suit our purpose. It is a plain-text
file, so we can use any text editor to edit it.

42
Oct. 14 : Paper Solution

2. Open the file named initdg4odbc.ora in any text editor.


This is the default init file for using ODBC connections. The contents
will look like the following:
# This is a sample agent init file that contains the HS parameters
# that are needed for the Database Gateway for ODBC
#
# HS init parameters
#
HS_FDS_CONNECT_INFO = <odbc data_source_name>
HS_FDS_TRACE_LEVEL = <trace_level>

ar
#
# Environment variables required for the non-Oracle system
#
#set <envvar>=<value>

k
3. The HS_FDS_CONNECT_INFO line is where the ODBC DSN is
specified. So replace the <odbc data_source_ name> string with the
name of the Data Source, which is ACME_POS.
an
4. The HS_FDS_TRACE_LEVEL line is for setting a trace level for the
connection. The trace level determines how much detail gets logged by
the service and it is OK to set the default as 0 (zero).
HS_FDS_CONNECT_INFO = ACME_POS
HS_FDS_TRACE_LEVEL = 0
al
Save the file - initacmepos.ora.

Editing the listener.ora file


dy

Now we're going to add a SID to our listener.ora file. listener.ora file is
present in ORACLE_HOME\network\ admin. The steps for this are:
1. Load the listener.ora file into a text editor (or Notepad). Add the
following lines to the file:
SID_LIST_LISTENER=
(SID_LIST=
Vi

(SID_DESC=
(SID_NAME=acmepos)
(ORACLE_HOME=D:\app\bob\product\11.1.0\db_1)
(PROGRAM=dg4odbc)
)
)

Save the listener.ora file, restart the listener for the change to take
effect. We can restart it by navigating to Start | Control Panel |

43
Vidyalankar : T.Y. B.Sc.(IT)  DW

Administrative Tools and then clicking on Services. Now, scroll down until
you see the service for your database listener, which will be named starting
with Oracle and ending in TNSListener. It will contain ORACLE_ HOME—
OracleOraDb11g_home1TNSListener. Now right-click on it and select
Restart.

Q.3 Attempt any Two of the following : [10]


Q.3(a) What is the significance of Time Dimension in a Data Warehouse? [5]
(A)  A time dimension is a key part of most data warehouses.
 It provides the time series information to describe the data.

ar
 A key feature of data warehouse is being able to analyze data from
several time periods and compare results between them.
 It is the dimension which provides us the means to retrieve data by time
period.

k
Every dimension has four characteristics that have to be defined in OWB :
1) Levels an 2) Dimension Attributes
3) Level Attributes 4) Hierarchies

1) Levels : The Levels are for defining the levels where aggregations
will occur, or to which data can be summed. There should be least two
levels in our Time dimension. While reporting on data from our data
al
warehouse, users will want to see totals summed up by certain time
periods such as per day, per month, or per year. These become the
levels. The Warehouse Builder has the following Levels available for the
Time dimension when using the Time Dimension Wizard, which we’ll
dy

discuss in a moment:
 Day  Fiscal week
 Calendar week  Fiscal month
 Calendar month  Fiscal quarter
 Calendar quarter  Fiscal year
 Calendar year
Vi

2) Dimension attributes :
 The Dimension Attributes are individual pieces of information
stored in the dimension that can be found at more than one level.
 Each level will have an ID that identifies that level, a start and an
end date for the time period represented at that level, a time span
that indicates the number of days in the period, and a description of
the level.

44
Oct. 14 : Paper Solution

3) Level Attributes :
 Each level has Level Attributes associated with it that provide
descriptive information about the value in that level. The dimension
attributes found at that level and additional attributes specific to
the level are included. For example, if we’re talking about the Month
level, we will find attributes that describe the value for the month
such as the month of the year it represents, or the month in the
calendar quarter. These would be numbers indicating which month of
the year or which month of the quarter it is.
4) Hierarchy :

ar
 There should be at least one Hierarchy for every dimension.
 A hierarchy is a structure in our dimension that is composed of certain
levels in order; there can be one or more hierarchies in a dimension.
 Calendar month, calendar quarter, and calendar year can be a

k
hierarchy.
 We could view our data at each of these levels, and the next level up
would simply be a summation of all the lower-level data within that
an
period. A calendar quarter sum would be the sum of all the values in
the calendar month level in that quarter.

Q.3(b) i) Every editor in OWB has an area in which the contents are [5]
displayed graphically. Name and explain the same.
al
ii) Name and explain the window that displays the configuration
information about items in the canvas.
(A) (i) Canvas
 Every editor has an area in which the contents are displayed
dy

graphically. This is called the Canvas.


 In the Data Object Editor, the objects in the Canvas will be the
objects that are created to hold data, which may be cube and
dimensions. Each object is displayed in a box with the name of the
object as the title of the box and attributes of the object listed
Vi

inside the box. These boxes can be moved around and resized
manually to suit our tastes. There are three tabs available in the
Data Object Editor Canvas: one for Relational, one for Dimensional,
and one for Business Definition.
 They are for displaying objects of the corresponding type. When
working with cubes and dimensions, these will be displayed on the
Dimensional tab. When working with the underlying tables, they
would have appeared on the Relational tab. The Business Definitions
are for interfacing with the Oracle Discoverer Business Intelligence
tool to analyze data.

45
Vidyalankar : T.Y. B.Sc.(IT)  DW

(ii) Configuration :
 The configuration window displays configuration information
(properties) about items on our Canvas. If nothing shows in this
window, just select an object in the Canvas by clicking on it and the
configuration will appear.
 It is here that we can change the deployment option for the object
to deploy OLAP metadata if we want a relational implementation to
store the OLAP metadata.

Q.3(c) Explain Name, Storage and Attributes tab in Dimension Details [5]

ar
window in the OWB Editor.
(A)  Name: This tab displays the name of the dimension along with some
other information specific to the dimension type we are looking at. In
this case, it’s a Time dimension created by the Time Dimension Wizard

k
and so it displays the range of data in our Time dimension.
 Storage: Here we can see what storage option is set for our dimension
object in the database, whether Relational or Multidimensional.
an
 Attributes: The attributes tab is where we can see the attributes that
are designed for our dimension. It displays the attributes in a tabular
form allowing us to view and/or edit them, including adding new
attributes or deleting the existing ones.
al
Q.3(d) Explain Name, Storage, Dimensions, Measures and Aggregations tab [5]
in Cube Details window.
(A)  Name : It has a name tab like the dimensions to display its name.
 Storage : It has a storage tab as per dimensions. However, we see a
dy

different option here under the Relational (ROLAP) option where we can
create bitmap indexes.
 Dimensions : Instead of attributes, the cube has a tab for dimensions.
The dimensions referenced by a cube are basically its attributes.
 Measures : The next tab is for the measures of the cube. It is for
those values that we are storing in our cube as the facts that we wish to
Vi

track.
 Aggregations : Instead of hierarchies, a cube has aggregations. There
are various methods of aggregation that we can select, as seen in the
drop-down box, the most common of which is sum, which is the default.
This is where the default aggregation method referred to earlier can be
changed. There will be no aggregations in a pure relational
implementation, so we will leave this tab set to the defaults and not
bother changing it.

46
Oct. 14 : Paper Solution

Q.4 Attempt any TWO of the following : [10]


Q.4(a) What is the importance of staging process while creating a Data [5]
Warehouse?
(A) (Refer Q.4(a) solution of April 14)

Q.4(b) Explain any three source and target operators provided by [5]
Warehouse Builder.
(A) Main operators
 Dimension Operator : An operator that represents previously defined
dimensions.

ar
 External Table Operator : This operator represents external tables. They
can be used to access data stored in flat files as if they were tables.
 Table Operator : This operator represents a table in the database. We
will need to store data in tables in our Oracle Database at some point in

k
the loading of data.

Common Operators an
 Constant : Represents a constant value that is needed. It can be used
to load a default value for a field that doesn’t have any input from
another source, for instance.
 View Operator : Represents a database view. Source data is frequently
retrieved via a view in the source database that can pull data from
al
multiple sources into a single, easily accessible view.
 Sequence Operator : Can be used to represent a database sequence,
which is an automatic generator of sequential unique numbers and is
most often used for populating a primary key field.
dy

 Construct Object : This operator can be used to actually construct an


Oracle object type in our mapping.

Q.4(c) What is the role of Constraints, Attribute Sets and Data Rules tab [5]
in OWB Editor for a table?
(A) Constraints (Keys)
Vi

The next tab after Columns is Constraints where we can enter any one of
the four different types of constraints on our new table. A constraint is a
property that we can set to tell the database to enforce some kind of rule
on the table that limits (or constrains) the values that can be stored in it.
There are four types of constraints:
 Check constraint : A constraint on a particular column that indicates
the acceptable values that can be stored in the column.
 Foreign key : A constraint on a column that indicates a record must
exist in the referenced table for the value stored in this column. We

47
Vidyalankar : T.Y. B.Sc.(IT)  DW

talked about foreign keys back in topic 2 when we discussed the


acme_pos transactional source database. A foreign key is also
considered a constraint because it limits the values that can be stored
in the column that is designated as a foreign key column.
 Primary key: A constraint that indicates the column(s) that make up the
unique information that identifies one and only one record in the table.
It is similar to a unique key constraint in which values must be unique.
The primary key differs from the unique key as other tables’ foreign
key columns use the primary key value (or values) to reference this
table. The value stored in the foreign key of a table is the value of the

ar
primary key of the referenced table for the record being referenced.
 Unique key : A constraint that specifies the column(s) value
combination(s) cannot be duplicated by any other row in the table.

Attribute Sets

k
The next tab is the Attribute Sets tab. An Attribute Set is a way to group
attributes of an object in an order that we can specify when we create an
an
attribute set. It is useful for grouping subsets of an object’s attributes (or
columns) for a later use.

Data Rules
The next tab is Data Rules. A data rule can be specified in the Warehouse
Builder to enforce rules for data values or relationships between tables. It
al
is used for ensuring that only high-quality data is loaded into the warehouse.

Q.4(d) Write down the steps involved in creating a mapping. [4]


(A) In Design Center, navigate to the Project and the Database Module. Select
dy

New Mapping, specify a name. Select the multiple tables from source
database and those in mapping.

Adding Source table in mapping


 There are a couple of ways to add a table to mapping. One way is to use
the Projects Navigator Window and the other way is to use the Palette
Vi

Window.
 Use the Table operator from the Component Palette.
 This pop up asks us which table we want to include as this table
operator.

Add target table in mapping


 The process of connecting the source to the target is the means of
telling the Warehouse Builder which data fields from the source go in
which data fields in the target.
 Select the Table operator and add the target Stage table.

48
Oct. 14 : Paper Solution

Add the Joiner Operator


 Add the joiner operator.
 Add the input groups depending on how many tables we have.
 Join the source tables to Joiner.

k ar
an
Define operator properties for the JOINER
Invoke Expression Builder by clicking on the button with the three dots
al
(…) to the right of the blank white box
For example :
dy
Vi

 Add the aggragator operator between the joiner operator and the stage
table.

49
Vidyalankar : T.Y. B.Sc.(IT)  DW

 An Aggregator operator is used to apply an aggregate function to the


data. Aggregator operator is going to use to group the data, and it will
create an output attribute for every attribute we use in the group by
clause.
 Connect output attributes from the JOINER operator to the input of
the AGGREGATOR operator, define properties for the AGGREGATOR
operator, and then connect the output of the AGGREGATOR operator to
the STAGE table operator by dragging a line between attributes.

Thus the mapping is ready.

ar
Q.5 Attempt any TWO of the following : [10]
Q.5(a) What is the role of TRIM( ), UPPER( ), SUBSTR operator and [5]
TO_NUMBER ( ) in ETL mapping?

k
(A) TRIM( ) – this function falls into the Transformation Operator on the
mapping.
an
al
dy

TRIM function displays the TRIM Transformation operator window on the


mapping. A TRIM operator has one input attribute and one output attribute.
The input attribute is the string to be trimmed for the whitespaces and
Vi

the output attribute represents the result of applying the TRIM operator
to the input string. It looks like the following screenshot:

UPPER( ) - this function also falls into the Transformation Operator on the
mapping which converts the input string into Upper case.

50
Oct. 14 : Paper Solution

ar
SUBSTR( ) - The Transformation operators in OWB include a substr (or
substring) transformation that will extract the specified number of
characters from the source string. The substr transformation takes three

k
parameters — the string we want to extract the substring from, a number
indicating the start position of the substring within the string, and a number
indicating the length of the substring to extract.
an
al
TO_NUMBER( ) – This function converts the expression into number.
This operator needs three parameters, only one of which is absolutely
necessary — the expression we wish to convert to a number. The other two
dy

parameters are optional and include a format string that we can use if we have
a particular format of number we want (such as a decimal point in a certain
place) and a parameter that allows us to set a certain national language format
to default to if it’s different from the language set in the database.
Vi

Q.5(b) How does validation play an important role in the process of [5]
building the Data Warehouse?
(A)  Validation is for error checking.
 It is about making sure the objects and mapping which are build in data
warehouse builder have no obvious errors in design.

Validating in the Design Centre?


Find our staging table, POS_TRANS_STAGE. This table is under the
ACME_DWH module in Oracle node and right-clicking on it will present us
with the following pop-up menu:

51
Vidyalankar : T.Y. B.Sc.(IT)  DW

The Validate... entry has been highlighted. If we click on it, it will perform
the validation of the metadata entered for the object and will present us
with the results in a separate dialog box as shown next:

The window on the right will contain the messages that have resulted from
the validation. Our POS_TRANS_STAGE table has validated successfully.
But if we had any warnings or errors, they would appear in this window.

The validation will result in one of the following three possibilities:


1) The validation completes successfully with no warnings and/or errors as

ar
this one did.
2) The validation completes successfully, but with some non-fatal warnings.
3) The validation fails due to one or more errors having been found.

k
The drop-down menu in the upper left has options for viewing all objects,
just warnings, or just errors.
an
The All Objects option, which is the default, displays all objects that have
been validated, whether or not there were warnings or errors. Select the
object, right-click and then select Validate. All the selected objects will be
validated and the results for all of them will appear in the window on the right.
al
If we select Warnings, only the objects that have warnings will be
displayed, and if we select Errors, only the objects with errors will be
displayed.
dy

Q.5(c) What are the different default operation modes of the mapping? [5]
(A) Default operating mode of the mapping :
The three modes are as follows:
 Set-based
 Row-based
 Row-based (target only)
Vi

 In Set-Based mode, the Warehouse Builder will generate a single SQL


statement that performs all the operations of our mapping in one
statement. It processes the data as a single set of data. This is good
for performance, but the drawback is that runtime auditing information
is limited. If any errors are generated, it is not able to tell us which row
generated the error. We can view the code that is set-based by
selecting SET_BASED in the Script tab from the Operating Mode drop-
down menu.

52
Oct. 14 : Paper Solution

 In Row-Based mode, the Warehouse Builder generates code to process


the data row by row. It uses a combination of SQL Cursors and PL/SQL
code. It does not provide as big a performance benefit as the set-based
mode, but we gain much greater auditing capability of the execution
results. There are also additional parameters that can be set to improve
the performance of this mode.
 The final of the three options is Row-Based (target only) mode. This
option creates a SQL select cursor and tries to include as many
operations as it can into that cursor to process the source data and
operations on it as a set, but then writes the rows to the target one row

ar
at a time. This will limit the auditing available for input and operations,
but provides greater auditing of the output to the target. We can select
ROW_BASED_TARGET_ONLY from the drop-down menu to view the
code for the option.

k
Q.5(d) Explain the importance of the seven columns in the Object Details [5]
Window.
(A)
an
The columns displayed in the Object Details window are as follows:
 Object : The name of the object.
 Design Status : The status of the design of the object in relation to
whether it has been deployed yet or not :
> New : The object has been created in the Design Center, but has
not been deployed yet.
al
> Unchanged : The Object has been created in the Design Center and
deployed previously, and has not been changed since its last
deployment.
> Changed : The Object has been created and deployed, and has
dy

subsequently undergone changes in the Design Center since its last


deployment.
 Deploy Action: What action will be taken upon the next deployment of
this object in the Control Center Manager
> Create: Create the object; if an object with the same name already
Vi

exists, this can generate an error upon deployment


> Upgrade: Upgrade the object in place, preserving data
> Drop: Delete the object
> Replace: Delete and recreate the object; this option does not
preserve data
 Deployed: Date and time of the last deployment
 Deploy Status: Results of the last deployment
> Not Deployed: The object has not been deployed yet
> Success: The last deployment was successful, without any errors or
warnings.

53
Vidyalankar : T.Y. B.Sc.(IT)  DW

> Warning: The last deployment had warnings


> Failed: The last deployment failed due to errors
 Location: The location defined for the object, which is where it will be
deployed.
 Module: The module where the object is defined

Some of the previous columns will allow us to perform an action associated


with the column by double-clicking or single-clicking in the column. The
following is a list of the columns that have actions available, and how to
access them:

ar
 Object: Double-click on the object name to launch the appropriate
editor on the object.
 Deploy Action: Click on the deploy action to change the deploy action for
the next deployment of the object via a drop-down menu. The list of

k
available actions that can be taken will be displayed. Not all the previously
listed actions are available for every object. For instance, upgrade is not
available for some objects and will not be an option for a mapping.
an
The other window in the Control Center Manager is the Control Center Jobs
window. This is where we can monitor the status of any deployments and
executions we've performed.
al
Q.6 Attempt any TWO of the following : [10]
Q.6(a) Why it necessary to maintain the Snapshots of an object? [5]
(A) (Refer Q.6(b) solution of April 14)
dy

Q.6(b) What is meant by synchronization of objects in a mapping? [5]


(A)  Many operators used in a mapping represent a corresponding workspace
object. If the workspace object (for instance, a table) changes, then
the operator also needs to change to be kept in sync. The process of
synchronization is what accomplishes that, and it has to be invoked by us
when changes are made.
Vi

 If u have the updated table definition for any table (the


POS_TRANS_STAGE table), we have to see any mappings that have
included a table operator for the changed table because they will have
to be synchronized to pick up the change.
 Mapping is created with a table operator that represents a table in the
database. These operators are bound to an actual table using a table
definition like we just edited.
 When the underlying table definition gets updated, we have to
synchronize those changes with any mappings that include that table.

54
Oct. 14 : Paper Solution

 We now have our STAGE_MAP mapping copied over to our new project.
So let's open that in the mapping editor by double-clicking on it and
investigate the process of synchronizing.
 To update the operator in the mapping to include the new column name,
we must perform the task of synchronization, which reconciles the two
and makes any changes to the operator to reflect the underlying table
definition. Doing the synchronization will accomplish both—add the new
column name and synchronize with the table.
 To synchronize, we right-click on the header of the table operator in
the mapping and select Synchronize... from the pop-up menu, or click on

ar
the table operator header and select Synchronize... from the main menu
Edit entry. This will pop up the Synchronize dialog box as shown next:

k
an
al
Click on the drop-down menu and select the table.
dy
Vi

55
Vidyalankar : T.Y. B.Sc.(IT)  DW

Q.6(c) Explain Standard ROLAP Architecture. [5]


(A)  Some vendors take the views that all data should be stored in relational
databases.
 They provide a multidimensional view of this data.
 For this, all of the relational OLAP vendors store the data in a special
way known as a star or snowflake schema.
 They store the data values in a de-normalized table known as the fact
table.
 One dimension is selected as the fact dimension.
 The dimension tables are then relationally joined with the fact table to

ar
allow multidimensional queries.
 The data is retrieved from the relational database into the client tool
by SQL queries. Since SQL was developed as an access language to
relational databases, it is not optimal for multidimensional queries.

k
 For instance, SQL can perform more complex calculations across rows
than across columns.
 By storing data in relational tables, a single piece of data is stored in
an
one and only one place. This ensures that the database is consistently
maintained and that transaction updates can be performed in a fast and
efficient manner.
 Although the fact data is indeed stored in a relational table and can be
accessed using the RDBMS itself.
al
 In order to provide the multidimensional views of the data all vendors
that use the relational database require that the data be organized in
the star or snowflake schema. This means that, in practice, the data
must be almost always duplicated.
dy

 There are many good reasons (performance, summarization and


organization of data into distinct time periods), why the data should be
duplicated anyway. Thus, the implication that using a relationally based
DBMS eliminates the need to duplicate the data is not valid.
 The vast majority of ROLAP applications are for simple analysis of large
volumes of information.
Vi

 Retail sales analysis is the most common one.

Standard ROLAP Architecture

56
Oct. 14 : Paper Solution

Q.6(d) Explain MOLAP Architecture. [5]


(A)  Multidimensional OLAP uses a storage mechanism which is optimized for
the pre-calculation, storage and retrieval of multidimensional data.
 They are best suited for medium sized, static applications which demand
sub-second data retrieval.
 Examples are analysis of historical sales and financial information.
However, since their batch pre-calculations can take a long time, they
are not optimum for dynamic applications where a result from new is
updated data is required. Their batch pre-calculation approach may also
make them unsuitable for large, very sparse applications with more than

ar
five dimensions as the data explosions can be unmanageable.
 One of the design objectives of the multidimensional server is to
provide fast, linear access to the data regardless of the way the data is
being requested.

k
 The simpler request is a two dimensional slice of data from the n-
dimensional hypercube.
 The objective is to retrieve the data very fast, regardless of the
an
requested dimensions.
 In practice, such simple slices are rare: more typically, the requested
data is a compound slice where two or more dimensions are nested as
rows or columns.
 That means the goal is to provide linear response time regardless of
al
where the data is being retrieved from in the hypercube.
dy

 The design goal should be to offer a complete algebraic ability where


Vi

any cell in the hypercube can be derived from any others, using all
standard business and statistical functions including conditional logic.



57

Das könnte Ihnen auch gefallen