Sie sind auf Seite 1von 21

Spring 2010 (Jan-June)

MBA – SEM – III

Name :Prashant D. Devale


Contact Number :9821602706
Email id :pdevale@gmail.com
Roll Number : 510932455
Learning Centre : Karrox Technologies Ltd- ANDHERI
Subject Code :MI0027
Subject : Business Intelligence and tools

Date of Submission at the Learning Centre: 30-4-


2010
Master of Business Administration-MBA Semester III
MI0027

Business Intelligence and tools - 2 Credits

Assignment Set- 1

1.(i) What do you understand by Business Intelligence System? What are the
different steps in order to deliver the Business Value through a BI System?
Ans.Business Intelligence (BI) is a generic term used to describe leveraging the organizational internal and
external data, information for making the best possible business decisions. The field of Business

intelligence is very diverse and comprises the tools and technologies used to access and analyze

various types of business information. These tools gather and store the data and allow the user to view

and analyze the information from a wide variety of dimensions and thereby assist the decision-makers

make better business decisions. Thus the Business Intelligence (BI) systems and tools play a vital

role as far as organizations are concerned in making improved decisions in the current cut throat

competitive scenario.

In simple terms, Business Intelligence is an environment in which business users receive reliable,

consistent, meaningful and timely information. This data enables the business users conduct analyses

that yield overall understanding of how the business has been, how it is now and how it will be in the

near future. Also, the BI tools monitor the financial and operational health of the organization through

generation of various types of reports, alerts, alarms, key performance indicators and dashboards.

Business intelligence tools are a type of application software designed to help in making better business

decisions. These tools aid in the analysis and presentation of data in a more meaningful way and so play

a key role in the strategic planning process of an organization. They illustrate business intelligence in

the areas of market research and segmentation, customer profiling, customer support, profitability, and

inventory and distribution analysis to name a few.

Various types of BI systems viz. Decision Support Systems, Executive Information Systems (EIS),

Multidimensional Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools

are discussed further. Whatever is the type, the Business Intelligence capabilities of the system is to

let its users slice and dice the information from their organization’s numerous databases without having
to wait for their IT departments to develop complex queries and elicit answers.

Although it is possible to build BI systems without the benefit of a data warehouse, most of the systems

are an integral part of the user-facing end of the data warehouse in practice. In fact, we can never think

of building a data warehouse without BI Systems. That is the reason; sometimes, the words ‘data

warehousing’ and ‘business intelligence’ are being used interchangeably.

The manager of a BI system has to take care of the following steps in order to deliver the intended

business value:

Step 1: Ensuing strong business partnership

Developing a solid business sponsorship is the first step to start a BI project. Your business sponsors

(it is generally good to have more than one) will take a lead role in determining the purpose, content, and

priorities of the system and so the business sponsors are expected to have the following skills;

1.Visionary – a sense for the value and potential of information with clear, specific ideas as to how to apply it.

2.Resourceful – able to obtain necessary resources and facilitate the organizational change that the BI system

will bring about.

3.Reasonable – can temper the enthusiasm with the understanding that the BI system takes time and resources

to come out as a major information system.

Step 2: Defining organizational-level business requirements

The long-term goal of a BI system is to build an organizational-wide information infrastructure.

Delivery of Business Value through a BI System

This cannot be done unless the BI system developing team understands business requirements at an

organizational level. Thus the process of understanding the organizational-level business requirements

includes the following steps:


1.Establishing the initial Project Scope

2.Interviewing the BI System stakeholders

3.Gathering the organizational level business requirements

4.Preparation of an overall Requirements document

Step 3: Prioritizing the business requirements

The prioritization process is a planning meeting that involves the BI system developing team, the

Business sponsors, and other key senior managers across the organization. A Prioritization Grid can

be developed for the set of business processes identified in the previous step against the feasibility

of a business process and the business value that the processes likely generate. Thus the output of

a prioritization process is a list of business processes in the priority order.

Step 4: Planning the Business Intelligence project

After getting the complete understanding about the business priorities, the BI System developing team

revisits the Project plan. Now the plan is made based on the priority of the business processes

detailed in the previous step.

Step 5: Defining the Project-level business requirements

Based on the previous steps, now the BI System developing team defines and documents the

project-level business requirements. These requirements act as guidelines while developing the

BI system.

(ii) Discuss the characteristics of a Data warehouse and analyze how the

development of a Data warehouse helps you in managing various functions in

your organizations.
Ans. According to Bill Inmon, who is considered to be the Father of Data warehousing, the data in a
Data Warehouse consists of the following characteristics:

Subject oriented

The first feature of DW is its orientation toward the major subjects of the organization instead of

applications. The subjects are categorized in such a way that the subject-wise collection of

information helps in decision-making. For example, the data in the data warehouse of an insurance

company can be organized as customer ID, customer name, premium, payment period, etc. rather

auto insurance, life insurance, fire insurance, etc.


Integrated

The data contained within the boundaries of the warehouse are integrated. This means that all

inconsistencies regarding naming convention and value representations need to be removed in a

data warehouse. For example, one of the applications of an organization might code gender as

‘m’ and ‘f’ and the other application might code the same functionality as ‘0′ and ‘1′. When the data

is moved from the operational environment to the data warehouse environment, this will result in

conflict.

Time variant

The data stored in a data warehouse is not the current data. The data is a time series data as the data

warehouse is a place where the data is accumulated periodically. This is in contrast to the data in an

operational system where the data in the databases are accurate as of the moment of access.

Non-volatility of the data

The data in the data warehouse is non-volatile which means the data is stored in a read-only format and

it does not change over a period of time. This is the reason the data in a data warehouse forms as a

single source for all decision system support processing.

Keeping the above characteristics in view, ‘data warehouse‘ can be defined as a subject-oriented,

integrated, non-volatile, time-variant collection of data designed to support the decision-making

requirements of an organization.

A data warehouse is a relational database that is designed for query and analysis rather than for

transaction processing. Typically, a data warehouse contains historical data derived from

transaction

data including the data from various other data sources of an organization. The data in a data

warehouse has the following characteristics; subject-orientation, integration, non-volatility, and time-

variance in order to support the decision-making requirements of an organization.

2.(i) What do you understand by Data warehouse Meta Data? What is the use of
Metadata? How can you manage Metadata?
Ans. In simple terms, ‘metadata’ refers to “data about data.” It is the information that describes, or
Supplements the main data. For example, metadata of a digital camera includes the settings used for

the picture, such as exposure value or flash intensity. Here, metadata acts as an additional information,

and is not critical to the functions of the main data. In other cases, such as a Zip disk, metadata might

provide the information regarding the write-protected status of the disk. In such a case, metadata is

essential for proper functioning of the main product. So the value of metadata depends on the context

that it is provided, and the ways that contextual information can be used. When data is made available,

the potential user (human or computer) must put the data into an existing model of knowledge, and may

ask questions to do so. For example, in the case of an image, metadata provides answers to many of

the questions like “When was the image taken?” and “Who are in the image?” In sophisticated data

systems, the metadata includes the contextual information surrounding the data and will also be very

sophisticated, capable of answering many questions that help understand the data. To sum up,

metadata can be defined as “the structured, encoded data that describe characteristics of

information-bearing entities to aid in the identification, discovery, assessment, and management of the

described entities.”

Use of Metadata

The applications of metadata are discussed below.

a.Metadata provides additional information to users of the data it describes and the information can

either be descriptive or algorithmic

b. Metadata speeds up and enriches searching for resources. Search queries using metadata saves users

from performing more complex filter operations manually. Also, web browsers, P2P applications and

media management software automatically downloads and locally caches metadata to improve the speed

at which files can be accessed and searched.


c.Metadata plays a vital role on the World Wide Web to find useful information from the large amount of

information available.

d.Metadata is an important part of electronic discovery. Application and file system metadata derived from

electronic documents and files acts as important evidence.

e. Some metadata is intended to enable variable content presentation. For example, if a picture has metadata

that indicates the most important region, the user can narrow the picture to that region and thus obtain the

details required.

f. Metadata can also be used to automate workflows. For example, if a software tool knows content and

structure of data, it can convert it automatically and pass it to another tool as input so that users need not

perform copy-and-paste operations required.

g. Metadata helps to bridge the semantic gap by explaining how a computer data items are related and how
these relations can be evaluated automatically. For example, if a search engine understands that
“Aditya Kaashyap” was a “Indian Engineer”, it can answer a search query on “Indian Engineers” with
a link to a web page about Aditya Kaashyap, although the exact words “Indian Engineers” never occur
on that page. This approach (called, knowledge representation) is of special interest to the semantic web
and artificial intelligence.

Managing of the Metadata

To successfully develop and use metadata, you need to understand the following important issues

that should be treated with care:

a. You need to keep track of the entire metadata created even in the early phases of planning and designing.

It is not economical to start attaching metadata once the production process has been completed.

b. Metadata must adapt if the resource it describes changes. It should be merged when two resources are

merged.

c. It can be useful to keep metadata even after the resource it describes has been removed.

d.Metadata can be stored either internally (in the same file as the data) or externally (in a separate file).

Internal storage allows transferring metadata together with the data it describes. Thus metadata is at

hand and can be easily manipulated. This method creates high redundancy and does not allow holding

metadata together. External storage allows bundling metadata, for example in a database, for more

efficient searching. There is no redundancy and metadata can be transferred simultaneously when

using streaming.
e.Storing the metadata in a human-readable format (such as XML) can be useful because users can

understand and edit it without specialized tools. But these formats are not optimized for storage

capacity. It may be useful to store metadata in a binary, non-human-readable format instead to speed up

transfer and save memory.

Although the majority of the computer professionals see metadata as a chance for better interoperability,

there are some demerits as detailed below:

a. Metadata is expensive and time-consuming. Also, it is very much complicated.

b. Metadata is subjective and depends on context. Two persons will attach different metadata to the same

resource due to their different points of view. Moreover, metadata can be misinterpreted due to its

dependency on context.

c.There is no end to metadata. For example, when annotating a soccer match with metadata, one can

describe only the players and their actions. Others can also describe the advertisements in the

background and the clothes the players wear. So even for a simple resource the amount of possible

metadata can be gigantic.

d. There is no real need for metadata as most of today’s search engines allow finding text very efficiently.

(ii) What do you understand by ETL? What are the significances of ETL processes?
What are the ETL requirements and steps?
Ans. Mostly the information contained in a data warehouse comes from the operational systems. But we all
know that the operational systems could not be used to provide the strategic information. So you need

to carefully understand what constitutes the difference between the data in the source operational systems

and the information in the data warehouse. It is all ETL functions that reshape the relevant data from the

source systems into useful information to be stored in the data warehouse. There would be no strategic

information in a data warehouse in the absence of these functions.

Significance of ETL Processes

The ETL functions act as the back-end processes that cover the extraction of the data from the source

systems. Also, they include all the functions and procedures for changing the source data into the exact

formats and structures appropriate for storage in the data warehouse database. After the transformation

of the data, the processes include all processes that physically move the data into the data warehouse

repository. After capturing the information, you cannot dump the data into the data warehouse. You have

to carefully subject the extracted data to all manner of transformations so that the data will be fit to

convert into information.

Let us try to understand the significance of the ETL function by taking an example. For instance, you

want to analyze and compare sales according to stores, product and month. But the sales data is

available in various applications of your organization. Therefore, you have to have the entire sales

details in the data warehouse database. You can do this by providing the sales and price in a fact table,

the products in a product dimension table, the stores in a stores dimension table and months in a

time dimension table. To do this, you need to extract the data from the respective source systems,

reconcile the variations in the data representations among the source systems, transform the entire sales

details, and load the sales into fact and dimension table. Thus the execution of ETL functions is

challenging because of the nature of the source systems.

Also, the amount of time to be spent on performing the ETL functions is as much as 50-70% of the total

effort to be put for building a data warehouse. To extract the data, you have to know the time window

during each day to extract the data from a specific source system without impacting the usage of that

system. Also, you need to determine the mechanism for capturing the changes in the data in each of

the relevant systems. Apart from the ETL functions, the building of a data warehouse includes the

functions like data integration, data summarization and metadata updating. Figure 6.1 details the

processes involved to execute the ETL functions in building a data warehouse.


ETL Requirements and Steps

Ideally, you are required to undergo the following steps provided in

3.(i) Describe briefly the Data Transformation process. What are the major types of
transformations? Describe them briefly.
Ans. You need to perform various types of transformation tasks before moving the extracted data from the
source systems into the data warehouse. The transformation of the data is to be done as per the

standards as the data comes from various source systems and you also need to ensure that the

combined data does not violate the business rules.

Irrespective of the complexity of the source systems,and regardless of the extent of your data warehouse

Major Transformation Types

By undertaking a combination of the basic tasks discussed above, you can do the following transformation

functions:

Format Revisions

Format revisions include changes to the data types and lengths of individual fields. For instance,

product package types in your source systems may be indicated by codes and names in which the fields

are numeric and text data types. Also, the lengths of package types might vary from one source

system to another. Therefore, you can standardize and change the data type to text in order to

provide values meaningful to the users using format revisions.

Decoding of Fields

This type of transformation deals with multiple source systems and you are bound to have same data
items described by a plethora of field values. For instance, the coding for two products manufactured

by an organization might have been done as 1 and 2 in one source system and is done as A and B

in another system. In such situations, you need to decode the codes and standardize the code before

loading the data into a data warehouse; otherwise there would be a conflict in the data analysis.

Calculated and Derived values

You can maintain both calculated and derived types of data values in a typical data warehouse.

For instance, you can keep ‘profit margin’ (this can be calculated as the difference between the total

sales and total cost) as a calculated value along with sales and cost amounts after extracting the data

from the sales system viz., sales volume, sales value, operating cost estimates. Similarly, you may use

average daily balances and operating ratios as derived fields.

Splitting of Single Fields

You need to split the larger single files for improved understanding and making better analysis.

For instance, the traditional legacy systems store name and address of customers in a large text files.

Similarly, some systems store city, state, and zip code data together in a single file.But these components

need to be stored individually to improve the operation performance by indexing on individual

components and to perform analysis by using individual components such as city, state, and zip code.

Merging of Information

This type of transformation deals with merging of information available in various source systems into

a single entity. For instance, the product code and description may come from one data source and

the relevant package types, the cost data may come from several other source systems. Here, merging of

information denotes combining the product code, description, package types, and cost into a single entity.

Summing Up

In this type of transformation, the summaries are created and then loaded in the data warehouse instead

of loading the most granular level of data. For instance, a credit card company need not store each and

every single transaction on each credit card in the data warehouse to analyze sales patterns. Instead, the
data can be summarized to the extent possible and store the summary data instead of the most granular

data.

Character Set Conversion

In this type of data transformation, the character sets are converted into an agreed standard character

set for textual data in the data warehouse. For instance, the source data will be in EBCDIC

(Extended Binary Coded Decimal Interchange Code) characters if you have mainframe legacy systems

as source systems. So you need to convert from mainframe EBCDIC format to the ASCII

(American Standard Code for Information Interchange), format if PC-based architecture is the choice of

your data warehouse.

Conversion of Units of Measurements

Use of standard unit of measurement is one of the prerequisites in building a data warehouse. If your

company has overseas operations, you may have to convert the metrics accordingly so that the numbers

may all be in one standard unit of measurement.

Here, the date/time conversion is an important measurement. For example, the date of October 9, 2006

is written as 10/09/2006 in the U.S format and as 09/10/2006 in the British format. This can be standardized

by writing it as 09 Oct 2006.

Key Restructuring

You have to come up with keys for the fact and dimension tables for a data warehouse to be built based on

the keys in the extracted records. So you look at the primary keys of the extracted records while

extracting data from the input sources. For instance, the product code in an organization is structured

to have an inherent meaning (like first letter describes the location code, second letter describes the

machine code, etc.) and you use this product code as the primary key and move the data into another

warehouse. Then the warehouse part of the product key will have to be changed before moving the data.

Therefore, avoid the keys with built-in meanings while choosing keys for your data warehouse

database tables and transform such keys into generic keys (that are generated by the system itself).

Reduplication

Some companies may maintain several records for a single customer and so duplicates are the result of

the additional records. Therefore, it is suggested to keep a single record for one customer and link all the

duplicates in the source systems to this single record in your data warehouse. This process is called

reduplication.
(ii) What do you understand by EIS? What are the significances of EIS?
Briefly describe the benefits of EIS.
Ans. Definition of an EIS
In simple terms, an EIS can be defined as a computer-based system intended to facilitate and support

the information and decision making needs of senior executives of an enterprise by providing easy

access to both internal and external information relevant to meeting the strategic goals of the organization.

These systems act as organizational-wide Decision Support Systems to help top-level executives analyze,

compare, and highlight the trends and patterns of the important variables. Also, these systems emphasize

on graphical displays, easy-to-use user interfaces and offer strong reporting capabilities.

Significance of EIS

An EIS provides the summarized or detailed data of the strategic information at the convenience of the senior

executives of an organization. An EIS performs all these functions by constantly monitoring the internal and

external events and trends. For instance, an executive can use EIS to view the sales functioning categorized

by product, region, month, etc. Similarly, the executive can also monitor the sales performance of the

organization’s competitors. Based on the snapshot provided by the EIS, the executive can drill down into

the organization’s data warehouse to display greater level of details and to explore the current and past data

patterns and trends. This process can be continued till the executive reaches a single transaction level and

thus EIS provides the executive with the information that explains the variance and helps in deciding a course

of action.

The tools offered by EIS are programmed to provide canned reports or briefing books to top-level executives.

Today these tools allow ad-hoc querying against a multi-dimensional database, and most offer analytical

applications along functional lines such as sales or financial analysis. But an organizational EIS cannot

become a substitute for other forms of information technologies and computer-based systems viz.,

Management Information Systems (MIS), Transaction Processing Systems (TPS), and Decision Support

Systems (DSS).

Today, the application of an EIS is not only in typical corporate hierarchies, but also at personal computers

on a local area network. These systems now cross computer hardware platforms and integrate information

stored on mainframes, personal computer systems, and minicomputers. As some client service companies

adopt the latest enterprise information systems, executives can use their personal computers to get access

to the company’s data and decide which data are relevant for their decision making. This arrangement enables

all users to customize their access to the proper company’s data and provide relevant information to both
upper and lower levels in companies.

Benefits of an EIS

The advantages that an EIS brings to the organization are:

a. Provides tools to select, extract, filter, and track the critical information of organization in an organized manner

b. Enables the top-level executives to use the system with ease (extensive computer experience is not required)

c. Provides timely delivery of the organization-wide summary of information highlighting the major deviations of the

information wherever they arise

e. Provides a wide range of reports including the status reports, trend analyses, drill down investigation, ad hoc

queries.

f. Presents the information in graphical, tabular, and/or text formats

The organizational EIS is not a substitute for other information technologies and computer-based systems.

The other decision support systems are still vital in bringing relevant information to the various levels of

a modern organization. The EIS feeds off the various information systems within an organization for its

internal information needs and then attaches itself to the external sources as and when necessary to

provide a macro view of business.

However, executives face the following limitations with the Executive Information Systems:

a. Cost of establishing an EIS is relatively high and so may not be economically viable for small companies

b. Functions are limited and so the systems may not perform complex calculations

c. Depends on the other information technologies in the organization to gather the organization’s internal data

d. Difficult to keep the current data as it focuses on historical data


Master of Business Administration-MBA Semester III
MI0027

Business Intelligence and tools - 2 Credits

Assignment Set- 2

1. Describe Data Quality Types and discuss the concepts of TQDM.


Ans. One of the main reasons for the failure of a data warehouse deployment is the poor quality of the data
loaded into a warehouse. So the managers need to be careful and take up the precautions required to

ensure that the quality of data loaded into the warehouse is appropriate.

Data Quality Types

There are two significant dimensions in understanding the quality of the data; intrinsic quality and realistic

quality. Here, the ‘intrinsic data quality’ is the correctness or accuracy of data and ‘realistic data quality’

is the value that the correct data has in supporting the work of the business or organization

To state simply, the ‘intrinsic data quality’ is the accuracy of the data. It is the degree to which data

accurately reflects the real-world object that the data represents. If all facts that an organization needs

to know about an entity are accurate, then that data has intrinsic quality.

Data that does not enable the organization to accomplish its mission has no quality, no matter how

accurate it is. Thus ‘realistic data quality’ comes into the picture. Realistic data quality is the degree of

utility and value the data has to support the organizational processes to accomplish the organizational

objectives. Fundamentally, realistic data quality is the degree of customer satisfaction that the knowledge

workers derive out of the use of the data.

Concept of TQDM

Many of the business intelligence projects do not deliver to full potential because of one reason that people

tend to see data quality as a one-time undertaking as a part of user acceptance testing (UAT). But it is very

important that data quality management is to be undertaken as a continuous improvement process. You

have to use an iterative approach as detailed below to achieve the data quality:

Step 1: To establish Data Quality Management environment

Undertaking a commitment to the Data Quality Management process can be accomplished by establishing

the data quality management environment between information system managers and establishing the

conditions to encourage coordination between functional and information system development professionals.
Functional users of legacy information systems know data quality problems of the existing systems but hardly

know how to improve the quality of the existing data systematically. But the Information system developers

know how to identify data quality problems, but hardly know how to change the functional requirements

that drive the systematic improvement of data. Given the existing barriers to communication, establishing

the data quality environment requires participation of both functional users and information system

administrators.

Step 2: To draft the Project scope

For each data quality analysis project selected, you may have to draft an initial plan that addresses the following items:

 Task Summary

 Task Description

 Project Approach

 Schedule

 Resources

Step 3: To implement the Data Quality Projects

A data quality project consists of four activities;

a. Define

b. Measure

c. Analyze

d. Improve

The data quality project manager performs these activities with input from the functional users of the

data, system developers, and database administrators of the legacy and target database systems.

Step 4: To evaluate Data Quality Management Methods


The objective of this step is to evaluate and assess the progress made in implementing data quality initiatives.

All stakeholders in the Data Quality Management process (functional users, program managers, developers,

and the Office of Data Management) are required to review the progress to determine whether data quality

projects have helped to achieve goals and benefits.

2. Discuss about the Visual Warehouse in details.


Ans. Visual Warehouse is an integrated product for building and maintaining a data warehouse or data mart
in a LAN environment. The visual warehouse integrates many of the business intelligence component

functions into a single product and it can be used to automate the process of bringing the data together

from heterogeneous sources into a central, integrated, information providing environment. It does not simply

create a data warehouse or an information database; but provides the processes to define, build, manage,

monitor and maintain an environment which provides information. The visual warehouse can be managed either

centrally or from the workgroup environment. Therefore, business groups can meet their own information

needs without burdening information systems resources, and can enjoy the autonomy of their own data mart

without compromising overall data integrity and security in the enterprise.

Following are some of the important features of visual warehouses:

a. Visual Warehouse has the ability to extract and transform data from a wide range of heterogeneous data sources

(both internal and external sources of an enterprise); such as the DB2 family, Microsoft SQL Server, Oracle,

Sybase, Informix, and flat files (for example, from spreadsheets). On the basis of the metadata defined by the

administrative component of visual warehouse, the data from any of these sources can be extracted and

transformed. Also, the extraction process, which supports full refreshing of data, can run on demand or on an

automated scheduled basis.

b.The transformed data can be placed in a data warehouse built on any of the DB2 UDB platforms (including DB2

for Windows NT, DB2 for AIX, DB2 for HP-UX, DB2 for Sun Solaris, DB2 for SCO, DB2 for SINIX, DB2 for OS/2,

DB2 for OS/400, and DB2 for OS/390) or on flat files. The visual warehouse provides the flexibility and scalability

to populate any combination of the supported databases. Also, visual warehouse supports Oracle, Sybase,

Informix, and Microsoft SQL Server using IBM DataJoiner.

c. Once the data is in the target data warehouse, the data can be accessible by a variety of end user query tools.

These tools can be from IBM, such as Lotus Application, or QMF for Windows, or from any other vendors whose

products comply with the DB2 Client Application Enabler (CAE) or the Open Database Connectivity (ODBC)

interface, such as Business Objects, and Cognos Impromptu. The data can also be browsed using any of the

popular web browsers with additional web-infrastructure components.


3. What do you understand by OLAP? What are the guidelines for an OLAP
system? Briefly discuss the concept of Data Mining.
Ans.OLAP is a category of software technology that enables the managers to gain insight into
data
through fast, consistent, interactive access in a wide variety of possible views of information
that has been transformed from raw data to reflect the real dimensions of the enterprise as
understood by the user. Thus the key elements of an OLAP system are speed, consistency,
interactive access and multiple dimensional views. To understand in simple terms, OLAP is
a
technical term for multi-dimensional analysis.

The guidelines for an OLAP system are as follows:


a. Multidimensional Conceptual View: The OLAP system has to provide a multidimensional data model that is
analytical and easy to use. It has to support ’slice-and-dice’ operations and is usually required in financial
modeling.

b. Transparency: These systems need to be part of an open system that supports heterogeneous data sources.

Also, the end-user need not necessarily be concerned about the details of data access or conversions.

c. Accessibility: The OLAP system should present the user with a single logical schema of the data. It has to map

its own logical schema to the heterogeneous physical data stores and perform any necessary transformations.

d. Consistent Reporting Performance: The users of the system should not experience any significant degradation

in reporting performance as the number of dimensions or the size of the database increases. Users need to

perceive consistent run time, response time, or machine utilization every time a given query is run.

e.Client/Server Architecture: The system has to have conformance to the principles of client/server architecture

for optimum performance, flexibility, adaptability, and interoperability. Also, the server component needs to be

sufficiently intelligent to enable various clients to be attached with minimum effort.

f. Generic Dimensionality: The system has to ensure that every data dimension is equivalent in both structure

and operational capabilities. We should be able to apply the function of one dimension to another too.

g. Dynamic Sparse Matrix Handling: This guideline is related to the idea of nulls in relational databases and to
the notion of compressing large files, and a sparse matrix is one in which not every cell contains data. So the

OLAP systems should accommodate varying storage and data-handling options.

h. Multi-user Support: Similar to EIS systems, the OLAP systems need to support multiple concurrent users,

including their individual views and/or slices of a common database.

i. Unrestricted Cross-dimensional operations: The OLAP system should have the ability to recognize

dimensional hierarchies and automatically perform roll-up and drill-down operations within a dimension or across

dimensions.

j. Intuitive data manipulation: The system should enable consolidation path reorientation (pivoting), drill-down

and roll-up, and other manipulations to be accomplished intuitively and directly via point-and-click and

drag-and-drop actions.

k.Flexible Reporting: The system should enable its users arrange columns, rows, and cells in a manner that

facilitates easy manipulation, analysis, and synthesis of information.

l. Unlimited Dimensions and Aggregation Levels: The system is expected to accommodate at least fifteen

(preferably twenty) data dimensions within a common analytical model.

Later in 1995, Codd included the following six requirements in addition to the above twelve basic guidelines:

a. Drill-through to Detail level: The system has to allow a smooth transition from the multidimensional,

pre-aggregated database to the detail record level of the source data warehouse repository.

b.Treatment of Non-normalized Data: The system should prohibit calculations made within it from getting

affected by the external data serving as the source.


C.Storing OLAP Results: The OLAP system should not deploy write-capable OLAP tools on top of transactional

systems.

Missing Values: The system should be able to ignore the missing values, irrespective of their source.

Incremental Database Refresh: The system has to provide for incremental refreshes of the extracted and

aggregated OLAP data.

SQL Interface: The OLAP system should have the ability to get integrated into the existing enterprise

environment.

Concept of Data Mining

By its simplest definition, data mining (DM) is the set of activities used to find new, hidden, or unexpected

patterns in data. It is the process of analyzing data from different perspectives and summarizing it into

useful information. Technically, the data mining process finds the correlations and patterns existing among

several fields in a large relational database.

In the past, decision support activities were based on the concept of verification. In this sense, a relational

database could be queried to provide dynamic answers to well-formed questions. The key issue in verification

is that it requires a great deal of prior knowledge on the part of the decision maker in order to verify a

suspected relationship through the query. In the 1990s, data warehouses with query and report tools assisted

the users in retrieving the types of decision support information they needed. Later OLAP tools came in to

existence for better sophisticated analysis.

Till this point, the approach for obtaining the information was mainly driven by the users. But the sheer

volume of data renders it impossible for anyone to use analysis and query tools to discern useful patterns.

For instance, in marketing research analysis, it is practically impossible to go through all the possible

associations and gain insights by querying and drilling down into the data warehouse. You might really

need a technology that can learn from past associations and results, and predict future behavior of

customers.

It is really good to have a tool that will accomplish the discovery of knowledge by itself to sustain

the cut-throat competition. Thus you require a data-driven approach rather than a user-driven one.

Using the information stored within a data warehouse, the data mining techniques can provide solutions to

the following questions:

1. Which type of products needs to be promoted for a specific individual customer?

2. Which scrips/securities are going to be more profitable during next trading session?
3. What is the probability for an individual customer to respond to a particular promotion?

4. What is the likelihood that an individual customer will default or payback on schedule?

These questions can be answered easily if the information hidden among the terabytes of data in your

databases can be located and utilized fully.

Another important DM technique is knowledge data discovery (KDD). Using a combination of techniques,

including statistical analysis, multidimensional analysis, intelligent agents, and data visualization, KDD

can discover highly useful informative patterns within the data that can be used to develop predictive models

of behavior.

Das könnte Ihnen auch gefallen