Beruflich Dokumente
Kultur Dokumente
A SEMINAR REPORT
Submitted by
TARA JOHN
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
SCHOOL OF ENGINEERING
KOCHI-682022
AUGUST 2008
DIVISION OF COMPUTER ENGINEERING
SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
KOCHI-682022
Certificate
Certified that this is a bonafide record of the seminar entitled
TARA JOHN
Date:
ACKNOWLEDGEMENT
At the outset, I thank the Lord Almighty for the grace, strength and
hope to make my endeavor a success.
I also express my gratitude to Dr. David Peter, Head of the
Department and my Seminar Guide for providing me with adequate
facilities, ways and means by which I was able to complete this seminar. I
express my sincere gratitude to him for his constant support and valuable
suggestions without which the successful completion of this seminar would
not have been possible.
I thank Mr.Damodaran, my seminar guide for her boundless
cooperation and helps extended for this seminar. I express my immense
pleasure and thankfulness to all the teachers and staff of the Department of
Computer Science and Engineering, CUSAT for their cooperation and
support.
Last but not the least, I thank all others, and especially my classmates
and my family members who in one way or another helped me in the
successful completion of this work.
TARA JOHN
ABSTRACT
business data. OLAP systems enable managers and analysts to rapidly and
easily examine key performance data and perform powerful comparison and
trend analyses, even on very large data volumes. They can be used in a wide
increasingly wide range of applications. The most common are sales and
planning; quality analysis, in fact for any management system that requires a
often across multiple time periods, with the aim of uncovering the business
information concealed within the data'- OLAP enables business users to gain
ABSTRACT iv
LIST OF FIGURES
1. INTRODUCTION 1
1.1 Olap Cube 2
1.2 Olap Operations 3
1.3 History 4
2. OLAP MILESTONES 9
3. DIFFERENCES BETWEEN OLTP 14
AND OLAP
4. DATAWAREHOUSE 15
4.1 Datawarehouse architecture 16
5. ETL TOOLS 20
5.1 ETL Concepts 21
6. CLASSIFICATIONS OF OLAP 22
7. APPLICATIONS OF OLAP 25
8. FUTURE GAZING: 34
Possibility of convergence
of OLAP and OLTP
REFERENCES 35
LIST OF FIGURES
NO NAME PAGE NO
1. OLAP CUBE 2
2. OLAP MILESTONES 9
3. STAR SCHEMA 17
4. SNOWFLAKE SCHEMA 18
5. ETL 22
6. EIS 31
Online Analytical Processing
1. INTRODUCTION
The OLAP term dates back to 1993, but the ideas, technology and even some of
the products have origins long before then.
1.1OLAP CUBE
An OLAP cube as shown in fig(1.1.1) is a data structure that allows fast analysis
of data. The arrangement of data into cubes overcomes a limitation of relational
databases. Relational databases are not well suited for near instantaneous analysis and
display of large amounts of data. Instead, they are better suited for creating records from
a series of transactions known as OLTP or On-Line Transaction Processing. Although
many report-writing tools exist for relational databases, these are slow when the whole
database must be summarized.
The OLAP cube consists of numeric facts called measures which are categorized
by dimensions. The cube metadata is typically created from a star schema or snowflake
schema of tables in a relational database. Measures are derived from the records in the
fact table and dimensions are derived from the dimension tables.
The analyst can understand the meaning contained in the databases using multi-
dimensional analysis. By aligning the data content with the analyst's mental model, the
chances of confusion and erroneous interpretations are reduced. The analyst can navigate
through the database and screen for a particular subset of the data, changing the data's
orientations and defining analytical calculations.The user-initiated process of navigating
by calling for page displays interactively, through the specification of slices via rotations
and drill down/up is sometimes called "slice and dice". Common operations include slice
and dice, drill down, roll up, and pivot.
Dice: The dice operation is a slice on more than two dimensions of a data cube (or
more than two consecutive slices).
Roll-up: A roll-up involves computing all of the data relationships for one or
more dimensions. To do this, a computational relationship or formula might be defined.
1.3 History
Nigel Pendse has suggested that an alternative and perhaps more descriptive term
to describe the concept of OLAP is Fast Analysis of Shared Multidimensional
Information (FASMI).
The output of an OLAP query is typically displayed in a matrix (or pivot) format.
The dimensions form the row and column of the matrix; the measures, the values.
The first product that performed OLAP queries was Express, which was released
in 1970 (and acquired by Oracle in 1995 from Information Resources). However, the
term did not appear until 1993 when it was coined by Ted Codd, who has been described
as "the father of the relational database". Codd's paper resulted from a short consulting
assignment which Codd undertook for former Arbor Software (later Hyperion Solutions,
and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released
its own OLAP product, Essbase, a year earlier. As a result Codd's "twelve laws of online
analytical processing" were explicit in their reference to Essbase. There was some
ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it
retracted the article. OLAP market experienced strong growth in late 90s with dozens of
commercial products going into market. In 1998, Microsoft released its first OLAP
Server - Microsoft Analysis Services, which drove wide adoption of OLAP technology
and moved it into mainstream.
APL
Multidimensional analysis, the basis for OLAP, is not new. In fact, it goes back to
1962, with the publication of Ken Iverson’s book, A Programming Language. The first
computer implementation of the APL language was in the late 1960s, by IBM. APL is a
mathematically defined language with multidimensional variables and elegant, if rather
abstract, processing operators. It was originally intended more as a way of defining
multidimensional transformations than as a practical programming language, so it did not
pay attention to mundane concepts like files and printers. In the interests of a succinct
notation, the operators were Greek symbols. In fact, the resulting programs were so
succinct that few could predict what an APL program would do. It became known as a
‘Write Only Language’ (WOL), because it was easier to rewrite a program that needed
maintenance than to fix it.
In spite of inauspicious beginnings, APL did not go away. It was used in many 1970s
and 1980s business applications that had similar functions to today’s OLAP systems.
Indeed, IBM developed an entire mainframe operating system for APL, called VSPC, and
some people regarded it as the personal productivity environment of choice long before
the spreadsheet made an appearance.
One of these APL-based mainframe products from the 1980s was originally called
Frango, and later Fregi. It was developed by IBM in the UK, and was used for interactive,
top-down planning. A PC-based descendant of Frango surfaced in the early 1990s as
KPS, and the product remains on sale today as the Analyst module in Cognos Planning.
This is one of several APL-based products that Cognos has built or acquired since 1999.
Ironically, this APL-based module has returned home, now that IBM owns Cognos.
Even today, more than 40 years later, APL continues to be enhanced and used in
new applications. It is used behind the scenes in many business applications, and has
even entered the worlds of unicode, object-oriented programming and Vista. Few, if any,
other 1960s computer languages have shown such longevity.
Express
EIS
By the mid 1980s, the term EIS (Executive Information System) had been born.
The idea was to provide relevant and timely management information using a new, much
simpler user interface than had previously been available. This used what was then a
revolutionary concept, a graphical user interface running on DOS PCs, and using touch
screens or mice. For executive use, the PCs were often hidden away in custom cabinets
and desks, as few senior executives of the day wanted to be seen as nerdy PC users.
The first explicit EIS product was Pilot’s Command Center though there had been
EIS applications implemented by IRI and Comshare earlier in the decade. This was a
By the late 1980s, the spreadsheet was already becoming dominant in end-user
analysis, so the first multidimensional spreadsheet appeared in the form of Compete. This
was originally marketed as a very expensive specialist tool, but the vendor could not
generate the volumes to stay in business, and Computer Associates acquired it, along
with a number of other spreadsheet products including SuperCalc and 20/20. The main
effect of CA’s acquisition of Compete was that the price was slashed, the copy protection
removed and the product was heavily promoted. However, it was still not a success, a
trend that was to be repeated with CA’s other OLAP acquisitions. For a few years, the old
Compete was still occasionally found, bundled into a heavily discounted bargain pack.
Later, Compete formed the basis for CA’s version 5 of SuperCalc, but the
multidimensionality aspect of it was not promoted.
By the late 1980s, Sinper had entered the multidimensional spreadsheet world,
originally with a proprietary DOS spreadsheet, and then by linking to DOS 1-2-3. It
entered the Windows era by turning its (then named) TM/1 product into a
multidimensional back-end server for standard Excel and 1-2-3. Slightly later, Arbor did
the same thing, although its new Essbase product could then only work in client/server
mode, whereas Sinper’s could also work on a stand-alone PC. This approach to bringing
multidimensionality to spreadsheet users has been far more popular with users. So much
so, in fact, that traditional vendors of proprietary front-ends have been forced to follow
suit, and products like Express, Holos, Gentia, MineShare, PowerPlay, MetaCube and
WhiteLight all proudly offered highly integrated spreadsheet access to their OLAP
servers. Ironically, for its first six months, Microsoft OLAP Services was one of the few
OLAP servers not to have a vendor-developed spreadsheet client, as Microsoft’s (very
basic) offering only appeared in June 1999 in Excel 2000. However, the (then)
OLAP@Work. Excel add-in filled the gap, and still (under its new snappy name,
BusinessQuery MD for Excel) provided much better exploitation of the server than did
Microsoft’s own Excel interface. Since then there have been at least ten other third party
Excel add-ins developed for Microsoft Analysis Services, all offering capabilities not
available even in Excel 2003. However, Business Objects’ acquisition of Crystal
Decisions has led to the phasing out of BusinessQuery MD for Excel, to be replaced by
technology from Crystal.
There was a rush of new OLAP Excel add-ins in 2004 from Business Objects,
Cognos, Microsoft, MicroStrategy and Oracle. Perhaps with users disillusioned by
disappointing Web capabilities, the vendors rediscovered that many numerate users
would rather have their BI data displayed via a flexible Excel-based interface rather than
in a dumb Web page or PDF. Microsoft is taking this further with PerformancePoint,
whose main user interface for data entry and reporting is via Excel.
2.OLAP MILESTONES
OLAP milestones
remaining sites.
1992 Essbase launched First well-marketed OLAP product, which went on to become
the market leading OLAP server by 1997.
1993 Codd white paper This white paper, commissioned by Arbor Software, brought
coined the OLAP multidimensional analysis to the attention of many more
term people than ever before. However, the Codd OLAP rules were
soon forgotten (unlike his influential and respected relational
rules).
1994 MicroStrategy First ROLAP to do without a multidimensional engine, with
DSS Agent almost all processing being performed by multi-pass SQL —
launched an appropriate approach for very large databases, or those
with very large dimensions, but suffers from a severe
performance penalty. The modern MicroStrategy 7i has a
more conventional three-tier hybrid OLAP architecture.
1995 Holos 4.0 released First hybrid OLAP, allowing a single application to access
both relational and multidimensional databases
simultaneously. Many other OLAP tools are now using this
approach. Holos was acquired by Crystal Decisions in 1996,
but has now been discontinued.
1995 Oracle acquired First important OLAP takeover. Arguably, it was this event
Express that put OLAP on the map, and it almost certainly triggered
the entry of the other database vendors. Express has now
become a hybrid OLAP and competes with both
multidimensional and relational OLAP tools. Oracle soon
promised that Express would be fully integrated into the rest
of its product line but, almost ten years later, has still failed to
deliver on this promise.
1996 BusinessObjects First tool to provide seamless multidimensional and relational
4.0 launched reporting from desktop cubes dynamically built from
relational data. Early releases had problems, now largely
resolved, but Business Objects has always struggled to deliver
a true Web version of this desktop OLAP architecture. It is
expected finally to achieve this by using the former Crystal
Enterprise as the base.
1997 Microsoft This project was code-named Tensor, and became the
announced OLE ‘industry standard’ OLAP API before even a single product
DB for OLAP supporting it shipped. Many third-party products now support
this API, which is evolving into the more modern XML for
Analysis.
1998 IBM DB2 OLAP This version of Essbase stored all data in a form of relational
Server released star schema, in DB2 or other relational databases, but it was
more like a slow MOLAP than a scalable ROLAP. IBM later
after Oracle acquired Express, and there are still very few
users of the Oracle OLAP Option.
2001 MicroStrategy Strategy.com was part of MicroStrategy’s grand strategy to
abandons become the next Microsoft. Instead, it very nearly bankrupted
Strategy.com the company, which finally shut the subsidiary down in late
2001.
2001 Siebel acquires Siebel was surprisingly successful with what became Siebel
nQuire Analytics, which now seems destined to become the core of
Oracle’s future BI strategy.
2002 Oracle ships Oracle9i Release 2 OLAP Option shipped in mid 2002, with a
integrated OLAP MOLAP server (a modernized Express), called the Analytical
server Workspace, integrated within the database. This was the
closest integration yet between a MOLAP server and an
RDBMS. But it is still not a complete solution, lacking
competitive front-end tools and applications.
2003 The year of Business Objects purchases Crystal Decisions, Hyperion
consolidation Solutions Brio Software, Cognos Adaytum, and Geac
Comshare.
2004 Excel add-ins go Business Objects, Cognos, Microsoft, MicroStrategy and
mainstream Oracle all release new Excel add-ins for accessing OLAP
data, while Sage buys one of the leading Analysis Services
Excel add-in vendors, IntelligentApps.
2004 Essbase database Hyperion releases Essbase 7X which included the results of
explosion curbed Project Ukraine: the Aggregate Storage Option. This finally
cured Essbase’s notorious database explosion syndrome,
making the product suitable for marketing, as well as
financial, applications.
2004 Cognos buys its Cognos buys Frango, the Swedish consolidation system. Less
second Frango well known is the fact that Adaytum, which Cognos bought in
the previous year, had its origins in IBM’s Frango project
from the early 1980s.
2005 Microsoft finally Originally planned for release in 2003, Microsoft managed to
ships the much- ship the major ‘Yukon’ version just before the end of 2005.
delayed SQL
Server 2005
2005 Pentaho buys Pentaho acquires Mondrian, as part of the process of
Mondrian assembling a full-blown open source BI suite.
2006 Palo launched The first open source MOLAP server.
2006 Microsoft buys Microsoft underlines its BI ambitions by buying the leading
ProClarity front-end suite for Analysis Services.
2006 Microsoft Within just a few months of the first beta release of
announces PerformancePoint Server, three leading CPM vendors
PerformancePoint changed hands.
2007 Oracle buys In the largest-ever BI consolidation, Oracle purchased
Hyperion Hyperion Solutions, bringing together multiple OLAP
products originating in over a dozen companies.
2007 Oracle delivers After years of promise, Oracle finally delivered genuine
embedded OLAP embedded OLAP capabilities in the 11g database, which can
deliver performance benefits to unchanged relational
applications via cube-based materialized views.
2008 IBM buys Cognos APL returns home, as IBM acquires Cognos Planning, which
had started life as Frango within IBM more than 25 years
earlier. Ironically, IBM now has to license the APL
technology still used in the Analyst module of Cognos
Planning.
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT
professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers,
executives, analysis).
2. Data Contents
3. Database Design
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database
design.
4. View
4.DATAWAREHOUSE
This classic definition of the data warehouse focuses on data storage. However,
the means to retrieve and analyze data, to extract, transform and load data, and to manage
the data dictionary are also considered essential components of a data warehousing
system. Many references to data warehousing use this broader context. Thus, an
expanded definition for data warehousing includes business intelligence tools, tools to
extract, transform, and load data into the repository, and tools to manage and retrieve
metadata.
A data warehouse provides a common data model for all data of interest
regardless of the data's source. This makes it easier to report and analyze information
than it would be if multiple data models were used to retrieve information such as sales
invoices, order receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse users so
that, even if the source system data is purged over time, the information in the warehouse
can be stored safely for extended periods of time.
• Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management (CRM)
systems
complex snowflake shape starts to emerge. The "snowflaking" effect only affects the
dimension tables and not the fact tables
So how is a data warehouse different from you regular database? After all,
both are databases, and both have some tables containing data. If we look deeper, we'd
find that both have indexes, keys, views, and the regular jing-bang. So is that 'Data
warehouse' really different from the tables in you application? And if the two aren't really
different, maybe we can just run your queries and reports directly from your application
databases!
The primary difference between you application database and a data warehouse is that
while the former is designed (and optimized) to record , the latter has to be designed (and
optimized) to respond to analysis questions that are critical for your business.
Application databases are OLTP (On-Line Transaction Processing) systems where every
transaction has to be recorded, and super-fast at that. Consider the scenario where a bank
ATM has disbursed cash to a customer but was unable to record this event in the bank
records. If this started happening frequently, the bank wouldn't stay in business for too
long. So the banking system is designed to make sure that every trasaction gets recorded
within the time you stand before the ATM machine. This system is write-optimized, and
you shouldn't crib if your analysis query (read operation) takes a lot of time on such a
system.
A Data Warehouse (DW) on the other end, is a database (yes, you are right, it's a
database) that is designed for facilitating querying and analysis. Often designed as OLAP
(On-Line Analytical Processing) systems, these databases contain read-only data that can
be queried and analysed far more efficiently as compared to your regular OLTP
application databases. In this sense an OLAP system is designed to be read-optimized.
5.ETL TOOLS
ETL Tools are meant to extract, transform and load the data into Data Warehouse for
decision making. Before the evolution of ETL Tools, the above mentioned ETL process
was done manually by using SQL code created by programmers. This task was tedious
and cumbersome in many cases since it involved many resources, complex coding and
more work hours. On top of it, maintaining the code placed a great challenge among the
programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and
they offer many advantages in all stages of ETL process starting from extraction, data
cleansing, data profiling, transformation, debuggging and loading into data warehouse
when compared to the old method.
There are a number of ETL tools available in the market to do ETL process the
data according to business/technical requirements.
Fig(5.1) ETL
Extraction, transformation, and loading. ETL refers to the methods involved in accessing
and manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and
target database(data warehouse or data mart). The second step is cleansing of source data
in staging area. The third step is transforming cleansed source data and then loading into
the target system.
• Informatica
• Datastage
6.CLASSIFICATIONS OF OLAP
In the OLAP world, there are mainly two different types: Multidimensional
OLAP (MOLAP) and Relational OLAP (ROLAP).
MOLAP
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a
multidimensional cube. The storage is not in the relational database, but in proprietary
formats. Multidimensional OLAP is one of the oldest segments of the OLAP market. The
business problem MOLAP addresses is the need to compare, track, analyze and forecast
high level budgets based on allocation scenarios derived from actual numbers. The first
forays into data warehousing were led by the MOLAP vendors who created special
purpose databases that provided a cube-like structure for performing data analysis.
MOLAP tools restructure the source data so that it can be accessed, summarized,
filtered and retrieved almost instantaneously. As a general rule, MOLAP tools provide a
robust solution to data warehousing problems. Administration, distribution, meta data
creation and deployment are all controlled from a central point. Deployment and
distribution can be achieved over the Web and with client/server models.
• Users who are connected to a network and need to analyze larger, less defined
data.
• Users who want to access predefined reports, but need to have the ability to
perform additional analysis on information that may not be contained in the report.
Advantages:
• Excellent performance: MOLAP cubes are built for fast data retrieval, and is
optimal for slicing and dicing operations.
• Can perform complex calculations: All calculations have been pre-generated
when the cube is created. Hence, complex calculations are not only doable, but they
return quickly.
Disadvantages:
• Limited in the amount of data it can handle: Because all calculations are
performed when the cube is built, it is not possible to include a large amount of data in
the cube itself. This is not to say that the data in the cube cannot be derived from a large
amount of data. Indeed, this is possible. But in this case, only summary-level information
will be included in the cube itself.
• Requires additional investment: Cube technology are often proprietary and do not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are
additional investments in human and capital resources are needed.
ROLAP
This methodology relies on manipulating the data stored in the relational database
to give the appearance of traditional OLAP's slicing and dicing functionality. In essence,
each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL
statement.. data warehouse sizes, users have come to realize that they cannot store all of
the information that they need in MOLAP databases. The business problem that ROLAP
addresses is the need to analyze
Due to the complexity and size of ROLAP implementations, the tools provide a
robust set of functions for meta data creation, administration and deployment. The focus
of these tools is to provide administrators with the ability to optimize system performance
and generate maximum analytical throughput and performance for users. All of the
ROLAP vendors provide the ability to deploy their solutions via the Web or within a
multitier client/server environment.
Advantages:
• Can handle large amounts of data: The data size limitation of ROLAP technology
is the limitation on data size of the underlying relational database. In other words,
ROLAP itself places no limitation on data amount.
• Can leverage functionalities inherent in the relational database: Often, relational
database already comes with a host of functionalities. ROLAP technologies, since they sit
on top of the relational database, can therefore leverage these functionalities.
Disadvantages:
• Performance can be slow: Because each ROLAP report is essentially a SQL query
(or multiple SQL queries) in the relational database, the query time can be long if the
underlying data size is large.
• Because ROLAP technology mainly relies on generating SQL statements to
query the relational database, and SQL statements do not fit all needs (for example, it is
difficult to perform complex calculations using SQL), ROLAP technologies are therefore
traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by
building into the tool out-of-the-box complex functions as well as the ability to allow
users to define their own functions.
7.APPLICATIONS OF OLAP
OLAP applications have been most commonly used in the financial and
marketing areas, but as we show here, their uses do extend to other functions. Data rich
industries have been the most typical users (consumer goods, retail, financial services and
transport) for the obvious reason that they had large quantities of good quality internal
and external data available, to which they needed to add value. However, there is also
scope to use OLAP technology in other industries. The applications will often be smaller,
because of the lower volumes of data available, which can open up a wider choice of
products (because some products cannot cope with very large data volumes).
Most commercial companies require this application, and most products are
capable of handling it to some degree. However, large-scale versions of this application
occur in three industries, each with its own peculiarities:
Consumer goods industries often have large numbers of products and outlets, and a
high rate of change of both. They usually analyze data monthly, but sometimes it may
go down to weekly or, very occasionally, daily. There are usually a number of
dimensions, none especially large (rarely over 100,000). Data is often very sparse
because of the number of dimensions. Because of the competitiveness of these
industries, data is often analyzed using more sophisticated calculations than in other
industries. Often, the most suitable technology for these applications is one of the
hybrid OLAPs, which combine high analytical functionality with reasonably large data
capacity.
Retailers, thanks to EPOS data and loyalty cards, now have the potential to analyze
huge amounts of data. Large retailers could have over 100,000 products (SKUs) and
hundreds of branches. They often go down to weekly or daily level, and may
sometimes track spending by individual customers. They may even track sales by time
of day. The data is not usually very sparse, unless customer level detail is tracked.
Relatively low analytical functionality is usually needed. Sometimes, the volumes are
so large that a ROLAP solution is required, and this is certainly true of applications
where individual private consumers are tracked.
The financial services industry (insurance, banks etc) is a relatively new user of OLAP
technology for sales analysis. With an increasing need for product and customer
profitability, these companies are now sometimes analyzing data down to individual
customer level, which means that the largest dimension may have millions of members.
Because of the need to monitor a wide variety of risk factors, there may be large
numbers of attributes and dimensions, often with very flat hierarchies.
Clickstream analysis
This is one of the latest OLAP applications. Commercial Web sites generate
gigabytes of data a day that describe every action made by every visitor to the site. No
bricks and mortar retailer has the same level of detail available about how visitors browse
the offerings, the route they take and even where they abandon transactions. A large site
has an almost impossible volume of data to analyze, and a multidimensional framework
is possibly the best way of making sense of it. There are many dimensions to this
analysis, including where the visitors came from, the time of day, the route they take
through the site, whether or not they started/completed a transaction, and any
demographic data about customer visitors.
The Web site should not be viewed in isolation. It is only one facet of an organization’s
business, and ideally, the Web statistics should be combined with other business data,
including product profitability, customer history and financial information. OLAP is an
ideal way of bringing these conventional and new forms of data together. This would
allow, for instance, Web sites to be targeted not simply to maximize transactions, but to
generate profitable business and to appeal to customers likely to create such business.
OLAP can also be used to assist in personalizing Web sites.
Many of the issues with clickstream analysis come long before the OLAP tool.
The biggest issue is to correctly identify real user sessions, as opposed to hits. This means
eliminating the many crawler bots that are constantly searching and indexing the Web,
and then grouping sets of hits that constitute a session. This cannot be done by IP address
alone, as Web proxies and NAT (network address translation) mask the true client IP
address, so techniques such as session cookies must be used in the many cases where
surfers do not identify themselves by other means. Indeed, vendors such as Visual
Insights charge much more for upgrades to the data capture and conversion features of
their products than they do for the reporting and analysis components, even though the
latter are much more visible.
Database marketing
1. Determine who the preferred customers are, based on their purchase of profitable
products. This can be done with brute force data mining techniques (which are slow and
can be hard to interpret), or by experienced business users investigating hunches using
OLAP cubes (which is quicker and easier).
2. Work to build loyalty packages for preferred customers via correct offerings.
Once the preferred customers have been identified, look at their product mix and buying
profile to see if there are denser clusters of product purchases over particular time
periods. Again, this is much easier in a multidimensional environment. These can then
form the basis for special offers to increase the loyalty of profitable customers
If these goals are met, both parties profit. The customers will have a company that knows
what they want and provides it. The company will have loyal customers that generate
sufficient revenue and profits to continue a viable business.
Database marketing specialists try to model (using statistical or data mining techniques)
which pieces of information are most relevant for determining likelihood of subsequent
purchases, and how to weight their importance. In the past, pure marketers have looked
for triggers, which works, but only in one dimension. But a well established company
may have hundreds of pieces of information about customers, plus years of transaction
data, so multidimensional structures are a great way to investigate relationships quickly,
and narrow down the data which should be considered for modeling.
Once this is done, the customers can be scored using the weighted combination of
variables which compose the model. A measure can then be created, and cubes set up
which mix and match across multidimensional variables to determine optimal product
mix for customers. The users can determine the best product mix to market to the right
customers based on segments created from a combination of the product scores, the
several demographic dimensions, and the transactional data in aggregate.
Finally, in a more simplistic setting, the users can break the world into segments
based on combinations of dimensions that are relevant to targeting. They can then
calculate a return on investment on these combinations to determine which segments
have been profitable in the past, and which have not. Mailings can then be made only to
those profitable segments. Products like Express allows the users to fine tune the
dimensions quickly to build one-off promotions, determine how to structure profitable
combinations of dimensions into segments, and rank them in order of desirability.
Every medium and large organization has onerous responsibilities for producing
financial reports for internal (management) consumption. Publicly quoted companies or
public sector bodies also have to produce other, legally required, reports.
Management reporting
flow, and less on the balance sheet. It will probably be done more often — usually
monthly, rather than annually and quarterly. There will be less detail but more analysis.
More users will be interested in viewing and analyzing the results. The emphasis is on
faster rather than more accurate reporting and there may be regular changes to the
reporting requirements. Users of OLAP based systems consistently..
The new Microsoft OLAP Services product and the many new client tools and
applications being developed for it will certainly drive down ‘per seat’ prices for general-
purpose management reporting applications, so that it will be economically possible to
deploy good solutions to many more users.
EIS
EIS is one branch of management reporting. The term became popular in the mid
1980s, when it was defined to mean Executive Information Systems; some people also
used the term ESS (Executive Support System). Since then, the original concept has been
discredited, as the early systems were very proprietary, expensive and hard to maintain.
Fig(7.1) EIS
The basic philosophy of EIS was that “what gets reported gets managed,” so if
executives could have fast, easy access to a number of key performance indicators (KPIs)
and critical success factors (CSFs), they would be able to manage their organizations
better. But there is little evidence that this worked for the buyers, and it certainly did not
work for the software vendors who specialized in this field, most of which suffered from
a very poor financial performance.
Profitability analysis
This is an application which is growing in importance. Even highly profitable
organizations ought to know where the profits are coming from; less profitable
organizations have to know where to cut back.
One popular way to assign costs to the right products or services is to use activity
based costing. This is much more scientific than simply allocating overhead costs in
proportion to revenues or floor space. It attempts to measure resources that are consumed
by activities, in terms of cost drivers. Typically costs are grouped into cost pools which
are then applied to products or customers using cost drivers, which must be measured.
Some cost drivers may be clearly based on the volume of activities, others may not be so
obvious. They may, for example, be connected with the introduction of new products or
suppliers. Others may be connected with the complexity of the organization (the variety
of customers, products, suppliers, production facilities, markets etc). There are also
infrastructure-sustaining costs that cannot realistically be applied to activities. Even
ignoring these, it is likely that the costs of supplying the least profitable customers or
products exceeds the revenues they generate. If these are known, the company can make
changes to prices or other factors to remedy the situation — possibly by withdrawing
from some markets, dropping some products or declining to bid for certain contracts.
There are specialist ABC products on the market and these have many FASMI
characteristics. It is also possible to build ABC applications in OLAP tools, although the
application functionality may be less than could be achieved through the use of a good
specialist tool.
Quality analysis
Although quality improvement programs are less in vogue than they were in the
early 1990s, the need for consistent quality and reliability in goods and services is as
important as ever. The measures should be objective and customer rather than producer
focused. The systems are just as relevant in service organizations and the public sector.
Indeed, many public sector service organizations have specific service targets.
These systems are used not just to monitor an organization’s own output, but also
that of its suppliers. There may, for example, be service level agreements that affect
contract extensions and payments.
Quality systems can often involve multidimensional data if they monitor numeric
measures across different production facilities, products or services, time, locations and
customers. Many of the measures will be non-financial, but they may be just as important
as traditional financial measures in forming a balanced view of the organization. As with
financial measures, they may need analyzing over time and across the functions of the
organization; many organizations are committed to continuous improvement, which
requires that there be formal measures that are quantifiable and tracked over long periods;
OLAP tools provide an excellent way of doing this, and of spotting disturbing trends
before they become too serious.
What prevents the convergence of functionality of OLAP AND OLTP tools, both
being business intelligence tools with data retrieval and analysis being common areas.
One significant difference being the function of time; in case of an OLTP tool the
transactions are being logged in real time due to the necessity of business to monitor
certain parameters regularly like inventory, production processes, whereas OLAP tools
are more concerned with historical data not only within a business process but also
combining the effect of the total business environment for example like promotions,
special external events such as holidays and weather etc. Because of their different nature
in the present form the service that is required is significantly different in each case. In
case of OLTP the service being online and ability to store large amount of data is critical
to its success, whereas in the case of an OLAP tool, the capability to do multidimensional
analysis with historical data is important. The first step in this direction might have been
taken by Microsoft by integrating reporting services with its SQL Server has taken the
lead in integrating functionalities in its tools.
REFERENCES
1. http://en.wikipedia.org/wiki/Online_analytical_processing
2. http://www.dmreview.com/issues/19971101/964-1.html
3. http://en.wikipedia.org/wiki/Extract,_transform,_load
4. http://www.olapreport.com/Applications.html