Sie sind auf Seite 1von 15

BUSINESS INTELLIGENCE &TOOLS

MB 0036

Set 1

Name: Rajesh Kumar

Roll number: 520932980

Learning centre: 03036

Subject: MB 0036- BUSINESS INTELLIGENCE & TOOLS

Assignment No.: Set 1

Date of submission at learning centre:


MB0036 – Business Intelligence & Tools

ASSIGNMENTS
Subject code: MB0036
(4 credits)
Set 1
Marks 60
SUBJECT NAME: BUSINESS INTELLIGENCE & TOOLS

Note: Each Question carries 10 marks

Q1. Define the term business intelligence tools? Briefly explain how the data from one
end gets transformed into information at the other end?

Ans: Business intelligence tools. The various tools of this suite are:

• Data Integration Tools: These tools extract, transform and load the data from the
source databases to the target database. There are two categories; Data Integrator and
Rapid Marts. Data Integrator is an ETL tool with a GUI. Rapid Marts is a packaged
ETL with pre-built data models for reporting and query analysis that makes initial
prototype development easy and fast for ERP applications.

The important components of Data Integrator include;

Graphicaldesigner: This is a GUI used to build and test ETL jobs for data cleansing,
validation and auditing.

Data integration server: This integrates data from different source databases.

Metadata repository: This repository keeps source and target metadata and the
transformation rules.

Administrator: This is a web-based tool that can be used to start, stop, schedule and
monitor ETL jobs.

• BI Platform: This platform provides a set of common services to deploy, use and
manage the tools and applications. These services include providing the security,
broadcasting, collaboration, metadata and developer services.
• Reporting Tools and Query & Analysis Tools: These tools provide the facility for
standard reports generation, ad hoc queries and data analysis.
• Performance Management Tools: These tools help in managing the performance of
a business by analyzing and tracking key metrics and goals.
• Business intelligence tools are a type of application software designed to help in
making better business decisions. These tools aid in the analysis and presentation of
data in a more meaningful way and so play a key role in the strategic planning process

2
MB0036 – Business Intelligence & Tools

of an organization. They illustrate business intelligence in the areas of market


research and segmentation, customer profiling, customer support, profitability, and
inventory and distribution analysis to name a few.
• Various types of BI systems viz. Decision Support Systems, Executive Information
Systems (EIS), Multidimensional Analysis software or OLAP (On-Line Analytical
Processing) tools, data mining tools are discussed further. Whatever is the type, the
Business Intelligence capabilities of the system is to let its users slice and dice the
information from their organization’s numerous databases without having to wait for
their IT departments to develop complex queries and elicit answers.
• Although it is possible to build BI systems without the benefit of a
data warehouse, most of the systems are an integral part of the
user-facing end of the data warehouse in practice. In fact, we can
never think of building a data warehouse without BI Systems. That is
the reason; sometimes, the words ‘data warehousing’ and ‘business
intelligence’ are being used interchangeably.
• Figure 1.1 depicts how the data from one end gets transformed to information at the
other end for business information.

Q2. What do you mean by data ware house? What are the major concepts and
terminology used in the study of data warehouse?

Ans: In simple terms, a data warehouse is the repository of an organization’s historical data
(also termed as the corporate memory). For example, an organization would get the
information that is stored in its data warehouse to find out what day of the week they sold the
most number of gadgets in May 2002, or how employees were on sick leave for a specific
week.

A data warehouse is a database designed to support decision making in an organization. Here,


the data from various production databases are copied to the data warehouse so that queries
can be forwarded without disturbing the stability or performance of the production systems.
So the main factor that leads to the use of a data warehouse is that complex queries and
analysis can be obtained over the information without slowing down the operational systems.
While operational systems are optimized for simplicity and speed of modification (online
transaction processing, or OLTP), the data warehouse is optimized for reporting and analysis
(online analytical processing, or OLAP). (The concepts of OLTP and OLAP are discussed in
later Units).

Apart from traditional query and reporting, a data warehouse provides the base for the
powerful data analysis techniques such as data mining and multidimensional analysis
(discussed in detail in later Units). Making use of these techniques will result in easier access
to the information you need for informed decision making.

3
MB0036 – Business Intelligence & Tools

Characteristics of a Data Warehouse

According to Bill Inmon, who is considered to be the Father of Data warehousing, the data in
a Data Warehouse consists of the following characteristics:

Subject oriented

The first feature of DW is its orientation toward the major subjects of the organization instead
of applications. The subjects are categorized in such a way that the subject-wise collection of
information helps in decision-making. For example, the data in the data warehouse of an
insurance company can be organized as customer ID, customer name, premium, payment
period, etc. rather auto insurance, life insurance, fire insurance, etc.

Integrated

The data contained within the boundaries of the warehouse are integrated. This means that all
inconsistencies regarding naming convention and value representations need to be removed
in a data warehouse. For example, one of the applications of an organization might code
gender as ‘m’ and ‘f’ and the other application might code the same functionality as ‘0′ and
‘1′. When the data is moved from the operational environment to the data warehouse
environment, this will result in conflict.

Time variant

The data stored in a data warehouse is not the current data. The data is a time series data as
the data warehouse is a place where the data is accumulated periodically. This is in contrast
to the data in an operational system where the data in the databases are accurate as of the
moment of access.

Non-volatility of the data

The data in the data warehouse is non-volatile which means the data is stored in a read-only
format and it does not change over a period of time. This is the reason the data in a data
warehouse forms as a single source for all decision system support processing.

Keeping the above characteristics in view, ‘data warehouse‘can be defined as a subject-


oriented, integrated, non-volatile, time-variant collection of data designed to support the
decision-making requirements of an organization.

Q 3. What are the data modeling techniques used in data warehousing environment?

Ans: There are two data modeling techniques that are relevant in a data warehousing
environment. They are Entity Relationship modeling (ER modeling) and dimensional
modeling.

• ER modeling produces a data model of the specific area of interest, using two basic
concepts: Entities and the Relationships between them. A detailed ER model may
also contain attributes, which can be properties of either the entities or the

4
MB0036 – Business Intelligence & Tools

relationships. The ER model is an abstraction tool as it can be used to simplify,


understand and analyze the ambiguous data relationships in the real business world.
• Dimensional modeling uses three basic concepts: Facts, Dimensions and Measures.
Dimensional modeling is powerful in representing the requirements of the business
user in the context of database tables and also in the area of data warehousing.

Both ER and dimensional modeling can be used to create an abstract model of a specific
subject. However, each of them has its own limited set of modeling concepts and associated
notation conventions. Consequently, the techniques seem different, and they are indeed
different in terms of semantic representation. There is much debate as to which method is
better and the conditions under which a specific technique is to be selected. There can be no
definite answer, understanding of the circumstances and the business requirements finally
lead to selection of an appropriate technique.

Entity- Relationship (E-R) Modeling

Basic Concepts

An ER model is represented by an ER diagram, which uses three basic graphic symbols to


conceptualize the data: entity, relationship, and attribute.

Entity

An entity is defined to be a person, place, thing, or event of interest to the business or the
organization. It represents a class of objects, which are things in the real business world that
can be observed and classified by their properties and characteristics. In general, an entity has
its own business definition and a clear boundary definition that is required to describe what is
included and what is not.

In a practical modeling project, the team members share a definition template for integration
and a consistent entity definition in the model. In case of a high-level business modeling, an
entity can be very generic, but it must be quite specific in the detailed logical modeling.
There are four entities; PRODUCT, PRODUCT MODEL, PRODUCT COMPONENT, and
COMPONENT in the ER diagram (Refer Figure 4.1) and are represented as rectangles.

Fig. 4.1: A Simple ER Model

5
MB0036 – Business Intelligence & Tools

The four diagonal lines on the corners of the PRODUCT COMPONENT entity represent that
the entity is ‘an associative entity’ and the entity is to resolve the many-to-many relationship
between two entities. PRODUCT MODEL and COMPONENT are independent of each other
but have a business relationship between them. A PRODUCT MODEL consists of many
components and a component is related to many product models. With this business rule, you
cannot tell which components make up a product model. To do this, you can define a
resolving entity. For example, the PRODUCT COMPONENT entity can provide the
information about which components are related to which product model.

In ER modeling, naming the entities is important for easy understanding and clear
communication. It is expressed grammatically in the form of a noun rather than a verb and
the criteria for selecting an entity name depend on how well the name represents the
characteristics and scope of the entity. Also, defining a unique identifier of an entity is the
most critical task. These unique identifiers are called candidate keys. Among them, you can
select the key that is most commonly used to identify the entity, called ‘primary key’.

Relationship

Relationships represent the structural interaction and association among the entities in a
model and they are represented with lines drawn between the two specific entities. Generally,
a relationship is named grammatically by a verb (such as owns, belongs, and has) and the
relationship between the entities can be defined in terms of the cardinality. Cardinality
represents the maximum number of instances of one entity that are related to a single instance
in another table and vice versa. Thus the possible cardinalities include one-to-one (1:1), one-
to-many (1:M), and many-to-many (M:M). In a detailed normalized ER model, any M:M
relationship is not shown because it is resolved to an associative entity.

Attributes

Attributes describe the characteristics of properties of the entities. The Product


ID, Description, and Picture are attributes of the PRODUCT entity in Figure 4.1.
The name of an attribute has to be unique in an entity and should be self-
explanatory to ensure clarity. For example, rather naming date1 and date2, you
may use the names; order date and delivery date. When an instance has no
value for an attribute, the minimum cardinality of the attribute is zero, which
means either nullable or optional.

In Figure 4.1, you can see the characters P, m, o, and F that stand for primary
key, mandatory, optional, and foreign key. The Picture attribute of the PRODUCT
entity is optional, which means it is nullable. A foreign key of an entity is defined
to be the primary key of another entity. In figure 4.1, the Product ID attribute of
the PRODUCT MODEL entity is a foreign key as it is the primary key of the
PRODUCT entity. These foreign keys are useful in determining the relationships
such as the referential integrity between the entities.

Other Concepts

Supertype and Subtype

An entity can have subtypes and supertypes and the relationship between a supertype entity
and its subtype entity is an IS A relationship. An IS A relationship is used where one entity is

6
MB0036 – Business Intelligence & Tools

a generalization of several more specialized entities. The supertype and subtype relationship
is represented by a triangle on the relationship. Figure 4.2 shows an example of supertype and
subtype entities wherein SALES OUTLET is the supertype of RETAIL STORE and
CORPORATE SALES OFFICE and RETAIL STORE, CORPORATE SALES OFFICE are
subtypes of SALES OUTLET. Here, each subtype entity inherits attributes from its supertype
entity.

Also, each subtype entity can have its own distinctive attributes. In the example provided
above, Region ID and Outlet ID are inherited attributes and the sub entities have their own
attributes (such as number of cash registers and floor space of the RETAIL STORE
subentity). The practical benefit of supertyping and subtyping is that they make a data model
more directly expressive. Just by looking at the ER diagram, you can see that sales outlets are
composed of ‘retail stores’ and ‘corporate sales offices’.

Fig. 4.2: Supertype and Subtype

Other important concepts in the area of ER modeling are ‘domain’ and ‘normalization’.

• A domain consists of all the possible acceptable values and categories that are
allowed for an attribute. It is the set of all real possible occurrences. The format or
data type, such as integer, date, and character, provides a clear definition of domain.
The practical benefit of domain is that it is imperative for building the data dictionary
or repository, and for implementing the database consequently.
• Normalization is a process of assigning the attributes to entities which in a way
reduces data redundancy, avoids data anomalies, provides a solid architecture for
updating data, and reinforces the long-term integrity of the data model (the third
normal form is usually adequate).

Dimensional Modeling

Dimensional modeling is a relatively new concept compared to ER modeling. This method is simpler,
more expressive, and easier to understand. This technique is mainly aimed at conceptualizing and
visualizing data models as a set of measures that are described by common aspects of the business. It
is useful for summarizing and rearranging the data and presenting views of the data to support data
analysis. Also, the technique focuses on numeric data, such as values, counts, and weights.

7
MB0036 – Business Intelligence & Tools

Q 4. Discuss the categories in which data is divided before structuring it into data ware
house?

Ans: The Data Warehouses can be divided into two types:

• Enterprise Data Warehouse


• Data Mart

Enterprise Data Warehouse

The Enterprise data warehouse consists of the data drawn from multiple operational systems
of an organization. This data warehouse supports time-series and trend analysis across
different business areas of an organization and so can be used for strategic decision-making.
Also, this data warehouse is used to populate various data marts.

Data Mart

As data warehouses contain larger amounts of data, organizations often create ‘data marts’
that are precise, specific to a department or product line. Thus data mart is a physical and
logical subset of an Enterprise data warehouse and is also termed as a department-specific
data warehouse. Generally, data marts are organized around a single business process.

There are two types of data marts; independent and dependant. The data is fed directly from
the legacy systems in case of an independent data mart and the data is fed from the enterprise
data warehouse in case of a dependent data mart. In the long run, the dependent data marts
are much more stable architecturally than the independent data marts.

Advantages and Limitations of a DW System

Use of a data warehouse brings in the following advantages for an organization:

• End-users can access a wide variety of data.


• Management can obtain various kinds of trends and patterns of data.
• A warehouse provides competitive advantage to the company by providing the data and
timely information.
• A warehouse acts as a significant enabler of commercial business applications viz., Customer
Relationship Management (CRM) applications.

However, following are the concerns that one has to keep in mind while using a data
warehouse:

• The scope of a Data warehousing project is to be managed carefully to attain the defined
content and value.
• The process of extracting, cleaning and loading the data and finally storing it into a data
warehouse is a time-consuming process.
• The problems of compatibility with the existing systems need to be resolved before building a
data warehouse.
• Security of the data may become a serious issue, especially if the warehouse is web
accessible.
• Building and maintenance of the data warehouse can be handled only through skilled
resources and requires huge investment.

8
MB0036 – Business Intelligence & Tools

Data Warehouse Concepts and Terminology

Various concepts and the key terms used in the study of data warehouse are provided below.

• Dashboard:

This is a reporting tool that consolidates aggregates and arranges measurements, metrics
(measurements compared to a goal) on a single screen so that information can be monitored
at a glance.

• Data Management:

This is the process of controlling, protecting, and facilitating access to data in order to
provide the end users with timely access to the data they need.

• Data Mining (or Data Surfing):

This is a technique geared for the typical user who does not know exactly what he is
searching for, but is looking for particular patterns or trends. Data mining is the process of
sifting through large amounts of data to produce data content relationships. It can predict
future trends and behaviors, allowing businesses to make proactive, knowledge-driven
decisions. The most valuable results from data mining include clustering, classifying, and
estimating the things that occur together. There are many kinds of tools that play a role in
data mining and they include neural networks, decision trees, visualization, general
algorithms, fuzzy logic, etc.

• Data Modeling:

A method used to define and analyze data requirements needed to support the business
functions of an organization.

• Data Profiling:

Data Profiling is a critical step in data migration that automates the identification of
problematic data and metadata, and enables organizations to correct inconsistencies,
redundancies and inaccuracies in their databases.

• Data Visualization:

Data visualization involves examining the data represented by dynamic images rather than
pure numbers. These are the techniques that turn the data into information by using the high
capacity of the human brain to visually recognize patterns and trends.

9
MB0036 – Business Intelligence & Tools

• Decentralized Warehouse:

A remote data source that users can query/access via a central gateway that provides a
logical view of corporate data in terms that users can understand. The gateway parses and
distributes queries in real time to remote data sources and returns result sets back to users.

• Drill-down:

This is the capacity to browse the information through a hierarchical structure as shown
below.

• External Data Source:

This is the data that is not available in the OLTP systems, but is required to enhance the
information quality in the data warehouse. The examples of this data include the data of the
competitors, information of the regulatory and government bodies, research data of the
professional bodies and universities.

• Metadata:

Metadata is data about data. The examples of metadata include data element descriptions,
data type descriptions, attribute descriptions, and process descriptions.

On-Line Analytical Processing (OLAP):

This is a category of software technology that enables the users gain insight into data
through fast, consistent, interactive access to a wide variety of possible views of information
that has been transformed from raw data to reflect the real dimensionality of the organization.
This is implemented in a multi-user client/server mode and offers consistently rapid response
to queries, regardless of database size and complexity. This software is also called
Multidimensional Analysis Software.

• On-Line Transaction Processing (OLTP):

This is the way the data is processed by an end user/a computer system. Here, the data is
detail oriented, highly repetitive with larger amounts of updates and changes. The major task
of these systems is to perform on-line transaction and query processing. These systems cover
most of the day-to-day operations of the organization, such as purchasing, inventory,
manufacturing, payroll, banking, accounting and registration.

10
MB0036 – Business Intelligence & Tools

• Operational Databases:

These are detail oriented databases defined to meet the needs of complex processes of an
organization. Here, the data is highly normalized to avoid data redundancy and double-
maintenance. A large number of transactions take place every hour on these databases and are
always “up to date” and represent a snapshot of the current situation. Contrast to these
databases, there are Informational databases that are stable over a period of time to represent
a situation at a specific point in time in the past.

Architecture of a Data Warehouse

The architecture describes the overall system of a Data Warehouse from various perspectives
such as data, process, and infrastructure to study the inter-relationships among various
components.

• The data perspective includes the source and target data structures and so it aids the user in
understanding what data assets are available in a data warehouse and how they are related.
• The process perspective is primarily concerned with communicating the process and flow of
data from the originating source system through the process of loading the data warehouse
and extracting data from the warehouse.
• The infrastructure or technology perspective details the various hardware and software
products used to implement the distinct components of the overall system.

Depending upon the specifics of an organizational situation, the following types of Data
Warehouse architectures are provided below:

• Basic architecture of a Data warehouse


• Architecture of a Data warehouse with Staging area
• Architecture of a Data warehouse with Staging area and Data marts

Fig 2.1 shows a simple architecture of a data warehouse wherein the end users directly
access the data derived from several source systems through the data warehouse.

Q 5. Discuss the purpose of executive information system in an organization?

11
MB0036 – Business Intelligence & Tools

Ans: An Executive Information System (EIS) is a set of management tools supporting the
information and decision-making needs of management by combining information available
within the organisation with external information in an analytical framework.

EIS are targeted at management needs to quickly assess the status of a business or section of
business. These packages are aimed firmly at the type of business user who needs instant and
up to date understanding of critical business information to aid decision making.

The idea behind an EIS is that information can be collated and displayed to the user without
manipulation or further processing. The user can then quickly see the status of his chosen
department or function, enabling them to concentrate on decision making. Generally an EIS
is configured to display data such as order backlogs, open sales, purchase order backlogs,
shipments, receipts and pending orders. This information can then be used to make executive
decisions at a strategic level.

The emphasis of the system as a whole is the easy to use interface and the integration with a
variety of data sources. It offers strong reporting and data mining capabilities which can
provide all the data the executive is likely to need. Traditionally the interface was menu
driven with either reports, or text presentation. Newer systems, and especially the newer
Business Intelligence systems, which are replacing EIS, have a dashboard or scorecard type
display.

Before these systems became available, decision makers had to rely on disparate spreadsheets
and reports which slowed down the decision making process. Now massive amounts of
relevant information can be accessed in seconds. The two main aspects of an EIS system are
integration and visualisation. The newest method of visualisation is the Dashboard and
Scorecard. The Dashboard is one screen that presents key data and organisational information
on an almost real time and integrated basis. The Scorecard is another one screen display with
measurement metrics which can give a percentile view of whatever criteria the executive
chooses.

Behind these two front end screens can be an immense data processing infrastructure, or a
couple of integrated databases, depending entirely on the organisation that is using the
system. The backbone of the system is traditional server hardware and a fast network. The
EIS software itself is run from here and presented to the executive over this network. The
databases needs to be fully integrated into the system and have real-time connections both in
and out. This information then needs to be collated, verified, processed and presented to the
end user, so a real-time connection into the EIS core is necessary.

Executive Information Systems come in two distinct types: ones that are data driven, and
ones that are model driven. Data driven systems interface with databases and data
warehouses. They collate information from different sources and presents them to the user in
an integrated dashboard style screen. Model driven systems use forecasting, simulations and
decision tree like processes to present the data.

As with any emerging and progressive market, service providers are continually improving
their products and offering new ways of doing business. Modern EIS systems can also
present industry trend information and competitor behaviour trends if needed. They can filter
and analyse data; create graphs, charts and scenario generations; and offer many other options
for presenting data.

12
MB0036 – Business Intelligence & Tools

There are a number of ways to link decision making to organisational performance. From a
decision maker's perspective these tools provide an excellent way of viewing data. Outcomes
displayed include single metrics, trend analyses, demographics, market shares and a myriad
of other options. The simple interface makes it quick and easy to navigate and call the
information required.

For a system that seems to offer business so much, it is used by relatively few organisations.
Current estimates indicate that as few as 10% of businesses use EIS systems. One of the
reasons for this is the complexity of the system and support infrastructure. It is difficult to
create such a system and populate it effectively. Combining all the necessary systems and
data sources can be a daunting task, and seems to put many businesses off implementing it.
The system vendors have addressed this issue by offering turnkey solutions for potential
clients. Companies like Actuate and Oracle are both offering complete out of the box
Executive Information Systems, and these aren't the only ones. Expense is also an issue. Once
the initial cost is calculated, there is the additional cost of support infrastructure, training, and
the means of making the company data meaningful to the system.

Does EIS warrant all of this expense? Green King certainly thinks so. They installed a
Cognos system in 2003 and their first few reports illustrated business opportunities in excess
of £250,000. The AA is also using a Business Objects variant of an EIS system and they
expect a return of 300% in three years. (Guardian 31/7/03)

An effective Executive Information System isn't something you can just set up and leave it to
do its work. Its success depends on the support and timely accurate data it gets to be able to
provide something meaningful. It can provide the information executives need to make
educated decisions quickly and effectively. An EIS can provide a competitive edge to
business strategy that can pay for itself in a very short space of time.

Q6. Discuss the challenges involved in data integration and coordination process?

Ans: In general, most of the data that the warehouse gets is the data extracted from a
combination of legacy mainframe systems, old minicomputer applications, and some
client/server systems. But these source systems do not conform to the same set of business
rules. Thus they may often follow different naming conventions and varied standards for data
representation. Thus the process of data integration and consolidation plays a vital role. Here,
the data integration includes combining of all relevant operational data into coherent data
structures so as to make them ready for loading into data warehouse. It standardizes the
names and data representations and resolves the discrepancies. Some of the challenges
involved in the data integration and consolidation process are as follows.

Identification of an Entity

Suppose there are three legacy applications that are in use in your organization; one is the
order entry system, second is customer service support system, and the third is the marketing
system. Each of these systems might have their own customer file to support the system.
Even most of the customers will be common to all these three files, the same customer on
each of these files have a different unique identification number.

13
MB0036 – Business Intelligence & Tools

As you need to keep a single record for each customer in a data warehouse, you need to get
the transactions of each customer from various source systems and then match them up to
load into the data warehouse. This is an entity identification problem in which you do not
know which of the customer records relate to the same customer. This problem is prevalent
where multiple sources exist for the same entities and the other entities that are prone to this
type of problem include vendors, suppliers, employees, and various products manufactured
by a company.

In case of three customer files, you have to design complex algorithms to match records from
all the three files and groups of matching records. But this is a difficult exercise. If the
matching criterion is too tight, then some records might escape the groups. Similarly, a
particular group may include records of more than one customer if the matching criterion
designed is too loose. Also, you might have to involve your users or the respective
stakeholders to understand the transaction accurately. Some of the companies attempt this
problem in two phases. In the first phase, the entire records, irrespective whether they are
duplicates or not, are assigned unique identifiers and in the second phase, the duplicates are
reconciled periodically ether through automatic algorithms or manually.

Existence of Multiple Sources

Another major challenge in the area of data integration and consolidation results from a
single data element having more than one source. For instance, cost values are calculated and
updated at specific intervals in the standard costing application. Similarly, your order
processing application also carries the unit costs for all products. Thus there are two sources
available to obtain the unit cost of a product and so there could be a slight variation in their
values. Which of these systems needs to be considered to store the unit cost in the data
warehouse becomes an important question. One easy way of handling this situation is to
prioritize the two sources, or you may select the source on the basis of the last update date.

Implementation of Transformation

The implementation of data transformation is a complex exercise. You may have to go


beyond the manual methods, usual methods of writing conversion programs while deploying
the operational systems. You need to consider several other factors to decide the methods to
be adopted. Suppose you are considering automating the data transformation functions, you
have to identify, configure and install the tools, train the team on these tools, and integrate
them into the data warehouse environment. But a combination of both methods proves to be
effective. The issues you may face in using manual methods and transformation tools are
discussed below.

Manual Methods

These are the traditional methods that are in practice in the recent past. These methods are
adequate in case of smaller data warehouses. These methods include manually coded
programs and scripts that are mainly executed in the data staging area. Since these methods
call for elaborate coding and testing and programmers and analysts who posses the
specialized knowledge in this area only can produce the programs and scripts.

Although the initial cost may be reasonable, ongoing maintenance may escalate the cost
while implementing these methods. Moreover these methods are always prone to errors.

14
MB0036 – Business Intelligence & Tools

Another disadvantage of these methods is about the creation of metadata. Even if the in-
house programs record the metadata initially, the metadata needs to be updated every time the
changes occur in the transformation rules.

Transformation Tools

The difficulties involved in using the manual methods can be eliminated using the
sophisticated and comprehensive set of transformation tools that are now available. Use of
these automated tools certainly improves efficiency and accuracy. If the inputs provided into
the tools are accurate, then the rest of the work is performed efficiently by the tool. So you
have to carefully specify the required parameters, the data definitions and the rules to the
transformation tool.

Also, the transformation tools enable the recording of metadata. When you specify the
transformation parameters and rules, these values are stored as metadata by the tool and this
metadata becomes a part of the overall metadata component of the data warehouse. When
changes occur to business rules or data definitions, you just have to enter the changes into the
tool and the metadata for the transformations get adjusted automatically. But relying on the
transformation tools alone without using the manual methods is also not practically possible.

Transformation for Dimension Attributes

Now we consider the updating of the dimension tables. The dimension tables are more stable
in nature and so they are less volatile compared to the fact tables. The fact tables change
through an increase in the number of rows, but the dimension tables change through the
changes to the attributes. For instance, we consider a product dimension table. Every year,
rows are added as new models become available. But what about the attributes that are within
the dimension table. You might face a situation where there is a change in the product
dimension table because a particular product was moved into a different product category. So
the corresponding values must be changed in the product dimension table. Though most of
the dimensions are generally constant over a period of time, they may change slowly.

15

Das könnte Ihnen auch gefallen