Beruflich Dokumente
Kultur Dokumente
ABSTRACT
Information analysis is a vital capability for corporate databases. Fast analysis of data
stored in databases or warehouses, is indispensable for businesses that wish to
stay ahead in the present competitive market scene. Online analytical processing
or OLAP provides real-time information analysis for business decision-making.
Besides this, OLAP gives the flexibility of viewing data from several viewpoints, a
property referred to as the multidimensionality of data present in existing
databases. OLAP can be clearly defined in terms of five buzzwords – Fast
Analysis of Shared Multidimensional Information, or FASMI for short. OLAP
finds applications particularly for the analyses needs of businesses owning large
data warehouses. A data warehouse stores non-operational or analytic data,
which OLAP processes to provide comprehensive information resources. It can
be used to generate business intelligence, the knowledge of market trends for
example, that is an essential ingredient to success for any enterprise. It is the
technology that promises growth, leadership and proficiency for a business
organization.
Mayur Hemani
Shreyansh Jain
Final year
MGM’s College of Engg. And Tech.
NOIDA
1. BUSINESS INFORMATION NEEDS
Businesses today face similar challenges, particularly owing to the fast shrinking
world. Most big businesses today span several nations, and technologies like the
Internet play truant to the cause of globalization. In such competitive business
environments, the ones that have the best and the most information, almost certainly
leads. But merely possessing information in huge quantities is not sufficient for
gaining an edge over business rivals. It is like having everything and yet nothing.
Analyzing this information into comprehensive, business knowledge is the need of the
hour.
The following are some of the information needs for contemporary businesses:
• Decision-critical information
• Patterns, relationships and hierarchies in market trends, and
• Business knowledge pertaining to their competitors
2. THE NEED FOR MULTIDIMENSIONAL INFORMATION
Information, as it is usually stored in the present-day databases, is in the form of
relations. Tables hold data, which can be queried on request and the information,
retrieved. This is sufficient as long as one does not need to attach any semantics to this
data. Once a person needs to acquire context and meaning of this information, and
relationships to other pieces of information, simple querying is of little help.
Information, here, must be viewed as slices or dimensions. Viewing information from
different points implies that one can model different scenarios based on the same set of
information, by just arranging them in different analytical formats.
Multidimensionality allows for assigning relationships between seemingly unrelated
information fields. It allows combining information from different sources, and
relating them to obtain useful knowledge.
Consider the example of the demand versus pricing data of a particular product. This
data represents the sum total of all the demand in different regions of the business. It
also aggregates demands over different time periods. However, for a better
understanding of the real market demand, one needs to apply the dimensions of time,
as well as the specific locations to the available data. This fact brings out the need for
multidimensionality of information visualization and access.
Fig.2. Multi-Dimensional
Fig.1. Simple
Data2with
Dimensional
the Time axis
Data
3. DATA WAREHOUSING
A data warehouse is the cohesive data model that defines the central data repository
for an organization. Data warehousing is a technique used basically to separate the
analytical data from the operational data in a business organization. Analytical data is
that part of the information that is required to be archived for providing the decision
support at a later stage of business. Data Warehousing is the collection of such
analytical data arranged in a meaningful pattern such that the stress is on faster query
execution unlike the Transactional Processing where the stress is on better flow of
information. A Data Warehouse may be defined as a subject-oriented repository,
designed with enterprise-wide access in mind.
The architecture of a Data warehousing system can be briefly broken up into the
following components: -
i) Operational Databases
Operational databases store information required for current transactions in
different departments of an organization.
ii) Remote Applications
The various (isolated) departments of an organization have specific functions
and jobs to perform which require inputs in the form of data and then
processing of the data for the information required.
iii) Data Warehouse
The master database, which stores the analytical information, extracted from
the operational databases. The analytical information is that part of the
operational databases which is of strategic or managerial importance, at some
stage in time.
iv) Data Marts
A data mart is a subset of enterprise-wide data warehouse, which typically
supports an enterprise element (department, region, function, etc.). For
example, for a data warehouse of a business house a likely classification would
be – the customer database, a collection of sales figures, prices of the products
etc. Each of these sub-groups acts as mini data warehouses.
v) Clients
They are the end users of the data warehouses that may be working through the
data marts, and deriving necessary information for activities like decision-
making, strategic planning, or business analysis. These are the groups, which
require information to be retrieved or mined from the warehouse. The time
required for retrieval is a major issue with most data warehouses.
A data warehouse stores tactical information that merely answers specific questions
about past events. It is capable of delivering only that information that is directly
related to some known facts or fields. But what if a person wants to simulate the
behaviour of a system (such as a market) if certain conditions prevailed? If a person
needs to predict or at least observe the trend of some variable when some affecting
value is varied, then it is not possible for a conventional data warehouse to produce the
information required. This is where OLAP comes into play. OLAP or online-analytical
processing is a technology complementary to data warehousing, in the sense that it
works on the data stored in a warehouse. Warehouses need OLAP to produce
multidimensional views of the information that it stores, and OLAP needs data
warehouses for the data it needs to process.
4. OLAP Defined
In contrast to the conventional data warehousing systems, where the data exists in the
form of the relations, the OLAP or the On Line Analytical Processing provides a better
insight into the untapped potential of data. It supports the multidimensional view of
data, which provides manifold increase in the information content of same amount of
data. OLAP provides fast, consistent, and efficient access to the various views of
information.
OLAP may be defined in terms of just five keywords – Fast Analysis of Shared
Multidimensional Information. Fast, such that the most complex queries requiring not
more than 5 seconds to be processed. Analysis alludes to the process of analyzing
information of all relevant kinds in order to process complex queries and establishing
clear criteria for the results of such queries. The information to be used for analysis is
generally obtained from a shared source, such as a data warehouse. The information
may be related in more than one or two dimensions. For example, a particular set of
business data may be related, variously, to sales figures, market trends, consumer
buying patterns, supplier conditions and the liquidity of the business. Presented in such
a multidimensional detail, such information can be useful and vital to managerial
decision-making.
5. OLAP TECHNIQUES
Online analytical processing or OLAP is implemented in many different ways.
However, the commonest way is to stage information retrieved from different
corporate databases, such as data warehouses, is staged – stored temporarily into
OLAP multi-dimensional databases for retrieval by the front-end systems. The
multidimensional database is optimized for fast retrieval. Several techniques for the
quickening of data retrieval and analysis are used on the procedural side of the
database management.
Fig.4. OLAP Implementation
Fig.5. Fig.6.
In figure 4 , there is a three dimensional data cube that organizes product sales data by
fiscal quarters and sales regions. Each cell could contain data for specific product,
specific fiscal quarter and specific region. By including additional dimensions, a data
hyper-cube could be produced, although more than three dimensions cannot be
graphically visualized. The data can be queried directly in any combinations of
dimensions, bypassing complex database queries.
A multidimensional model of storage involves two types of tables – dimension tables
and fact tables. A dimension table simply stores the tuples of attributes of the
dimension. A fact table can be thought of as having tuples, one per a recorded fact.
This fact contains some measured or observed variable(s), and identifies it with
pointers to dimension table. The fact table contains data and the dimensions identify
each tuple in that data.
Changing from one dimension hierarchy to another is easily achieved in a data cube by
a technique called Pivoting (also called rotation). Several other techniques go into
making a complete OLAP implementation. These vary from drilling down into finer
details, from different databases, to rolling up and grouping into larger units.
2. Drill-down
OLAP can go in reverse direction and can display detailed data that comprises
Fig : 5 consolidated data. This is called drill-down. For example, the sales by individual
products or by sales-representatives that make up a region’s total sales, could be
easily accessed.
3. Slicing and dicing
Slicing and dicing refers to the ability to look at a database from various
viewpoints. One slice of sales database might show all sales of product type within
a region. Another slice might show all sales by sales channel, within each product-
type. Slicing and dicing is often performed along the time axis in order to analyze
trends and final patterns.
6. OLAP APPLICATIONS
OLAP is widely used in several realms of data management. Some of these
applications include: -
1. Financial Applications
• Activity-based costing (resource allocation)
• Budgeting
2. Marketing/Sales Applications
• Market Research Analysis
• Sales Forecasting
• Promotions Analysis
• Customer Analyses
• Market/Customer Segmentation
3. Business modeling
• Simulating business behaviour
• Extensive, real-time decision support system for managers
All of the above applications need ability to provide managers with the information
they need to make effective decisions about an organization's strategic directions. The
key indicator of a successful OLAP application is its ability to provide information, as
needed, that is, its ability to provide "just-in-time" information for effective decision-
making. This requires more than a base level of detailed data.
7. APPLYING OLAP TO DATA WAREHOUSING
Data warehousing is primarily used to archive data that is of the nature of business
knowledge. A primary requirement is the fast analysis of this shared data, resulting in
multidimensional views of the data, which in turn results in knowledge acquisition.
OLAP comes into play when it is required to analyze data stored in a warehouse, so as
to generate complex results, in a time-constrained environment – that is, just-in-time
information. Most data warehouses support, what is called Ad hoc querying, which
implies that any combination of complex queries can be executed against the stored
data.
Several tradeoffs must be made when applying OLAP to warehouses, particularly
because of the bulk of data that has to be analyzed. For example, an OLAP system
must be able to process a complex query in the shortest possible time. For this, it can
use two approaches – to have pre-calculated data, or to apply all calculations on the
fly. The first approach requires huge amounts of data storage area, and hence exotic
hardware support, which proves to be cost-ineffective. The second approach is to
apply calculations, on the fly, which results in slower execution of the queries. Thus,
an optimal approach is to pre-calculate some data, and calculate in real-time other
parts of the queries. These elements need to be balanced in an optimal way for a good
implementation of OLAP systems.
Future scope
Futurescopeoftheprojectincludesidentifyingneweroperators,exploringuseoftheseop
normalrelationaldatabasesandinvestigatingwaysofinter-
operatingbetweenthem.Alsowe
projectasaprecursortointeractive,adhocminingfullyimplementedinsideadatabasesystem
.