Sie sind auf Seite 1von 12

OLAP: CONCEPTS AND APPLICATIONS TO DATAWAREHOUSING

ABSTRACT
Information analysis is a vital capability for corporate databases. Fast analysis of data
stored in databases or warehouses, is indispensable for businesses that wish to
stay ahead in the present competitive market scene. Online analytical processing
or OLAP provides real-time information analysis for business decision-making.
Besides this, OLAP gives the flexibility of viewing data from several viewpoints, a
property referred to as the multidimensionality of data present in existing
databases. OLAP can be clearly defined in terms of five buzzwords – Fast
Analysis of Shared Multidimensional Information, or FASMI for short. OLAP
finds applications particularly for the analyses needs of businesses owning large
data warehouses. A data warehouse stores non-operational or analytic data,
which OLAP processes to provide comprehensive information resources. It can
be used to generate business intelligence, the knowledge of market trends for
example, that is an essential ingredient to success for any enterprise. It is the
technology that promises growth, leadership and proficiency for a business
organization.

Mayur Hemani
Shreyansh Jain

Final year
MGM’s College of Engg. And Tech.
NOIDA
1. BUSINESS INFORMATION NEEDS

Company XYZ is an international player in the car-manufacturing arena. It is planning


to expand its horizons in global markets, with the launch of a new series of motor
vehicles. Since XYZ has market holds in different parts of the world, the promotion
and the subsequent launch timings is critical to its success. What it needs is the exact
information about the launch period – whether other companies would be joining in
the bandwagon – does the buying pattern of the targeted customers support the
proposed plans and if not what changes should be brought in the local promotion
policies so as to achieve the highest profits. All these questions would remain
unanswered without the present state-of-the-art Decision Support systems. With the
help of Decision Support systems capable of generating business knowledge,
companies like XYZ can easily be leaders in their sport.

Businesses today face similar challenges, particularly owing to the fast shrinking
world. Most big businesses today span several nations, and technologies like the
Internet play truant to the cause of globalization. In such competitive business
environments, the ones that have the best and the most information, almost certainly
leads. But merely possessing information in huge quantities is not sufficient for
gaining an edge over business rivals. It is like having everything and yet nothing.
Analyzing this information into comprehensive, business knowledge is the need of the
hour.

The following are some of the information needs for contemporary businesses:
• Decision-critical information
• Patterns, relationships and hierarchies in market trends, and
• Business knowledge pertaining to their competitors
2. THE NEED FOR MULTIDIMENSIONAL INFORMATION
Information, as it is usually stored in the present-day databases, is in the form of
relations. Tables hold data, which can be queried on request and the information,
retrieved. This is sufficient as long as one does not need to attach any semantics to this
data. Once a person needs to acquire context and meaning of this information, and
relationships to other pieces of information, simple querying is of little help.
Information, here, must be viewed as slices or dimensions. Viewing information from
different points implies that one can model different scenarios based on the same set of
information, by just arranging them in different analytical formats.
Multidimensionality allows for assigning relationships between seemingly unrelated
information fields. It allows combining information from different sources, and
relating them to obtain useful knowledge.
Consider the example of the demand versus pricing data of a particular product. This
data represents the sum total of all the demand in different regions of the business. It
also aggregates demands over different time periods. However, for a better
understanding of the real market demand, one needs to apply the dimensions of time,
as well as the specific locations to the available data. This fact brings out the need for
multidimensionality of information visualization and access.

Fig.2. Multi-Dimensional
Fig.1. Simple
Data2with
Dimensional
the Time axis
Data

3. DATA WAREHOUSING
A data warehouse is the cohesive data model that defines the central data repository
for an organization. Data warehousing is a technique used basically to separate the
analytical data from the operational data in a business organization. Analytical data is
that part of the information that is required to be archived for providing the decision
support at a later stage of business. Data Warehousing is the collection of such
analytical data arranged in a meaningful pattern such that the stress is on faster query
execution unlike the Transactional Processing where the stress is on better flow of
information. A Data Warehouse may be defined as a subject-oriented repository,
designed with enterprise-wide access in mind.

Fig.3. Architecture of a data warehouse

The architecture of a Data warehousing system can be briefly broken up into the
following components: -
i) Operational Databases
Operational databases store information required for current transactions in
different departments of an organization.
ii) Remote Applications
The various (isolated) departments of an organization have specific functions
and jobs to perform which require inputs in the form of data and then
processing of the data for the information required.
iii) Data Warehouse
The master database, which stores the analytical information, extracted from
the operational databases. The analytical information is that part of the
operational databases which is of strategic or managerial importance, at some
stage in time.
iv) Data Marts
A data mart is a subset of enterprise-wide data warehouse, which typically
supports an enterprise element (department, region, function, etc.). For
example, for a data warehouse of a business house a likely classification would
be – the customer database, a collection of sales figures, prices of the products
etc. Each of these sub-groups acts as mini data warehouses.
v) Clients
They are the end users of the data warehouses that may be working through the
data marts, and deriving necessary information for activities like decision-
making, strategic planning, or business analysis. These are the groups, which
require information to be retrieved or mined from the warehouse. The time
required for retrieval is a major issue with most data warehouses.

A data warehouse stores tactical information that merely answers specific questions
about past events. It is capable of delivering only that information that is directly
related to some known facts or fields. But what if a person wants to simulate the
behaviour of a system (such as a market) if certain conditions prevailed? If a person
needs to predict or at least observe the trend of some variable when some affecting
value is varied, then it is not possible for a conventional data warehouse to produce the
information required. This is where OLAP comes into play. OLAP or online-analytical
processing is a technology complementary to data warehousing, in the sense that it
works on the data stored in a warehouse. Warehouses need OLAP to produce
multidimensional views of the information that it stores, and OLAP needs data
warehouses for the data it needs to process.

4. OLAP Defined
In contrast to the conventional data warehousing systems, where the data exists in the
form of the relations, the OLAP or the On Line Analytical Processing provides a better
insight into the untapped potential of data. It supports the multidimensional view of
data, which provides manifold increase in the information content of same amount of
data. OLAP provides fast, consistent, and efficient access to the various views of
information.
OLAP may be defined in terms of just five keywords – Fast Analysis of Shared
Multidimensional Information. Fast, such that the most complex queries requiring not
more than 5 seconds to be processed. Analysis alludes to the process of analyzing
information of all relevant kinds in order to process complex queries and establishing
clear criteria for the results of such queries. The information to be used for analysis is
generally obtained from a shared source, such as a data warehouse. The information
may be related in more than one or two dimensions. For example, a particular set of
business data may be related, variously, to sales figures, market trends, consumer
buying patterns, supplier conditions and the liquidity of the business. Presented in such
a multidimensional detail, such information can be useful and vital to managerial
decision-making.

5. OLAP TECHNIQUES
Online analytical processing or OLAP is implemented in many different ways.
However, the commonest way is to stage information retrieved from different
corporate databases, such as data warehouses, is staged – stored temporarily into
OLAP multi-dimensional databases for retrieval by the front-end systems. The
multidimensional database is optimized for fast retrieval. Several techniques for the
quickening of data retrieval and analysis are used on the procedural side of the
database management.
Fig.4. OLAP Implementation

OLAP is basically concerned with multidimensionality of data. Multidimensional


models take advantage of inherent relationships in data to populate data in multi-
dimensional matrices called Data Cubes or Hypercubes (for more than three
dimensions). A standard spreadsheet, signifying a conventional database, is a two-
dimensional matrix. One example would a spreadsheet of regional sales by product for
a particular time period. Products could be shown as rows with sales revenues for each
region comprising the columns. Adding a time dimensional such as the organization’s
fiscal quarters would produce a three-dimensional matrix, which could be represented
using a cube.

Fig.5. Fig.6.
In figure 4 , there is a three dimensional data cube that organizes product sales data by
fiscal quarters and sales regions. Each cell could contain data for specific product,
specific fiscal quarter and specific region. By including additional dimensions, a data
hyper-cube could be produced, although more than three dimensions cannot be
graphically visualized. The data can be queried directly in any combinations of
dimensions, bypassing complex database queries.
A multidimensional model of storage involves two types of tables – dimension tables
and fact tables. A dimension table simply stores the tuples of attributes of the
dimension. A fact table can be thought of as having tuples, one per a recorded fact.
This fact contains some measured or observed variable(s), and identifies it with
pointers to dimension table. The fact table contains data and the dimensions identify
each tuple in that data.

Changing from one dimension hierarchy to another is easily achieved in a data cube by
a technique called Pivoting (also called rotation). Several other techniques go into
making a complete OLAP implementation. These vary from drilling down into finer
details, from different databases, to rolling up and grouping into larger units.

The following techniques are used for OLAP implementation: -


1. Consolidation or Roll Up
Consolidation involves the aggregation of data. This can involve simple roll-ups or
complex grouping involving inter-related data. For example, sales office can be
rolled-up to district and district again to regions.

2. Drill-down
OLAP can go in reverse direction and can display detailed data that comprises
Fig : 5 consolidated data. This is called drill-down. For example, the sales by individual
products or by sales-representatives that make up a region’s total sales, could be
easily accessed.
3. Slicing and dicing
Slicing and dicing refers to the ability to look at a database from various
viewpoints. One slice of sales database might show all sales of product type within
a region. Another slice might show all sales by sales channel, within each product-
type. Slicing and dicing is often performed along the time axis in order to analyze
trends and final patterns.

6. OLAP APPLICATIONS
OLAP is widely used in several realms of data management. Some of these
applications include: -
1. Financial Applications
• Activity-based costing (resource allocation)
• Budgeting
2. Marketing/Sales Applications
• Market Research Analysis
• Sales Forecasting
• Promotions Analysis
• Customer Analyses
• Market/Customer Segmentation
3. Business modeling
• Simulating business behaviour
• Extensive, real-time decision support system for managers

All of the above applications need ability to provide managers with the information
they need to make effective decisions about an organization's strategic directions. The
key indicator of a successful OLAP application is its ability to provide information, as
needed, that is, its ability to provide "just-in-time" information for effective decision-
making. This requires more than a base level of detailed data.
7. APPLYING OLAP TO DATA WAREHOUSING
Data warehousing is primarily used to archive data that is of the nature of business
knowledge. A primary requirement is the fast analysis of this shared data, resulting in
multidimensional views of the data, which in turn results in knowledge acquisition.
OLAP comes into play when it is required to analyze data stored in a warehouse, so as
to generate complex results, in a time-constrained environment – that is, just-in-time
information. Most data warehouses support, what is called Ad hoc querying, which
implies that any combination of complex queries can be executed against the stored
data.
Several tradeoffs must be made when applying OLAP to warehouses, particularly
because of the bulk of data that has to be analyzed. For example, an OLAP system
must be able to process a complex query in the shortest possible time. For this, it can
use two approaches – to have pre-calculated data, or to apply all calculations on the
fly. The first approach requires huge amounts of data storage area, and hence exotic
hardware support, which proves to be cost-ineffective. The second approach is to
apply calculations, on the fly, which results in slower execution of the queries. Thus,
an optimal approach is to pre-calculate some data, and calculate in real-time other
parts of the queries. These elements need to be balanced in an optimal way for a good
implementation of OLAP systems.

8. BENEFITS OF USING OLAP


OLAP holds several benefits for businesses: -
1. OLAP helps managers in decision-making through the multidimensional data
views that it is capable of providing, thus increasing their productivity.
2. OLAP applications are self-sufficient owing to the inherent flexibility provided to
the organized databases.
3. It enables simulation of business models and problems, through extensive usage of
analysis-capabilities.
4. In conjunction with data warehousing, OLAP can be used to provide reduction in
the application backlog, faster information retrieval and reduction in query drag..
1. CONCLUSION
The applications and benefits of online analytical processing (OLAP) indicate clearly
that it is one of the revolutionary technologies that can be used extensively in data
management. All businesses, big or small, can benefit from using OLAP, at some
scale or other. The instant access to information and the apparent knowledge gained
from this information is invaluable considering against the cost involved in
establishing this setup.
Online analytical processing can respond quickly to complex needs of managers and
analysts. To manage. Decision support. Or management information system which
OLAP operations are as follows.
– Access to large amounts of data such as sales data within the data warehouse in
many years.
– Analysis of business relationships between factors such as product sales area.
– Total value of relevant information such as sales and costs within the budget
area.
– To compare the performance period every month every quarter every year.
– A rapid response to user need. Therefore, managers or analysts to track the
analysis or decision at any step of work.
limitation
The OLAP approach works very well for most business
dimensions but has limitations for others. For example, the sales
revenue measure for our retail stores is fairly straightforward with
dimensions such as time, geography, and product. We can view
sales of a product or product line in a store or region for a period
of time and even compare that with sales of the same or other
products in the same or different stores for the same period.

The reason that the sales performance measure is so


straightforward is that the dimensions are very structured and
relatively small in number of dimension values. The time
dimension probably has calendar day at the lowest level of detail
and aggregates up to the levels we would want to see such as
weeks, periods, and/or years. The geography dimension would
contain our hierarchy such as store, branch, division, and/or
region. We may create one or more hierarchies for the products
as well with SKU, brand, product line, etc. as levels. The key
point is that we could decide before building the ‘cube’ how we
want to see and compare the data.

There are measures, however, for which it is difficult or


impossible to define ahead of time which slices of the data we
want to see and compare with other slices. Customer data
contain these types of measures. For example, suppose we
have a million or so customers who do business with us. We
know a lot of things (attributes) about each of those customers
and we calculate various performance measures for those
customers that tell us things such as how much the customer
spent with us, how often they purchased our goods and services,
and how much their average purchase was. What we would like
to do is figure out how to positively impact customer spending.
What dimensions would we use to analyze this data in order to
draw some conclusions about what is determining our
customers’ purchasing behavior? We may have dozens of
attributes for each customer (such as demographics, location, or
tenure) and each attribute would be a candidate to be a
dimension in an OLAP cube. Unfortunately, a cube has some
very real performance limitations which cause it to run out of
steam if a measure has over 8 to 10 dimensions (more or less
depending on the tool and the number of values in each
dimension). We could try choosing the 8 or 10 attributes that
have the highest likelihood of affecting purchasing behavior and
make them dimensions. This could be highly arbitrary though
since any combination of attributes might prove to have a fairly
significant impact on purchasing decisions. While we could try
using combinations of demographic attributes to lessen the
number of dimensions required, the user interface would become
more confusing and we would still be subject to the same kind of
limitations for other attributes.

Maintenance can also be a problem with this solution because if


the dimensionality of the cube is changed, then the cube must be
rebuilt entirely and, in some cases, all variables must be deleted
and added with the new dimensionality.

What we find, then, is that OLAP is very hard to implement with


dimensions which are either very large or have numerous
attributes. The customer dimension usually has both conditions
which makes it extremely difficult to analyze using OLAP
technology.

Future scope

Futurescopeoftheprojectincludesidentifyingneweroperators,exploringuseoftheseop
normalrelationaldatabasesandinvestigatingwaysofinter-
operatingbetweenthem.Alsowe
projectasaprecursortointeractive,adhocminingfullyimplementedinsideadatabasesystem
.

Das könnte Ihnen auch gefallen