Karrox Technologies LTD-: Andheri

Spring 2010 (Jan-June)
MBA – SEM – III
Name :Prashant D. Devale

Contact Number :9821602706
Email id :pdevale@gmail.com
Roll Number : 510932455
Learning Centre : Karrox Technologies Ltd- ANDHERI
Subject Code :MI0027
Subject : Business Intelligence and tools
Date of Submission at the Learning Centre: 30-4-

2010
Master of Business Administration-MBA Semester III
MI0027
Business Intelligence and tools - 2 Credits
Assignment Set- 1
1.(i) What do you understand by Business Intelligence System? What are the
different steps in order to deliver the Business Value through a BI System?
Ans.Business Intelligence (BI) is a generic term used to describe leveraging the organizational internal and
external data, information for making the best possible business decisions. The field of Business
intelligence is very diverse and comprises the tools and technologies used to access and analyze
various types of business information. These tools gather and store the data and allow the user to view
and analyze the information from a wide variety of dimensions and thereby assist the decision-makers
make better business decisions. Thus the Business Intelligence (BI) systems and tools play a vital
role as far as organizations are concerned in making improved decisions in the current cut throat
competitive scenario.
In simple terms, Business Intelligence is an environment in which business users receive reliable,
consistent, meaningful and timely information. This data enables the business users conduct analyses
that yield overall understanding of how the business has been, how it is now and how it will be in the
near future. Also, the BI tools monitor the financial and operational health of the organization through
generation of various types of reports, alerts, alarms, key performance indicators and dashboards.
Business intelligence tools are a type of application software designed to help in making better business
decisions. These tools aid in the analysis and presentation of data in a more meaningful way and so play
a key role in the strategic planning process of an organization. They illustrate business intelligence in
the areas of market research and segmentation, customer profiling, customer support, profitability, and
inventory and distribution analysis to name a few.
Various types of BI systems viz. Decision Support Systems, Executive Information Systems (EIS),
Multidimensional Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools
are discussed further. Whatever is the type, the Business Intelligence capabilities of the system is to
let its users slice and dice the information from their organization’s numerous databases without having
to wait for their IT departments to develop complex queries and elicit answers.
Although it is possible to build BI systems without the benefit of a data warehouse, most of the systems
are an integral part of the user-facing end of the data warehouse in practice. In fact, we can never think
of building a data warehouse without BI Systems. That is the reason; sometimes, the words ‘data
warehousing’ and ‘business intelligence’ are being used interchangeably.
The manager of a BI system has to take care of the following steps in order to deliver the intended
business value:
Step 1: Ensuing strong business partnership
Developing a solid business sponsorship is the first step to start a BI project. Your business sponsors
(it is generally good to have more than one) will take a lead role in determining the purpose, content, and
priorities of the system and so the business sponsors are expected to have the following skills;
1.Visionary – a sense for the value and potential of information with clear, specific ideas as to how to apply it.
2.Resourceful – able to obtain necessary resources and facilitate the organizational change that the BI system
will bring about.
3.Reasonable – can temper the enthusiasm with the understanding that the BI system takes time and resources
to come out as a major information system.
Step 2: Defining organizational-level business requirements
The long-term goal of a BI system is to build an organizational-wide information infrastructure.
Delivery of Business Value through a BI System
This cannot be done unless the BI system developing team understands business requirements at an
organizational level. Thus the process of understanding the organizational-level business requirements
includes the following steps:

1.Establishing the initial Project Scope
2.Interviewing the BI System stakeholders
3.Gathering the organizational level business requirements
4.Preparation of an overall Requirements document
Step 3: Prioritizing the business requirements
The prioritization process is a planning meeting that involves the BI system developing team, the
Business sponsors, and other key senior managers across the organization. A Prioritization Grid can
be developed for the set of business processes identified in the previous step against the feasibility
of a business process and the business value that the processes likely generate. Thus the output of
a prioritization process is a list of business processes in the priority order.
Step 4: Planning the Business Intelligence project
After getting the complete understanding about the business priorities, the BI System developing team
revisits the Project plan. Now the plan is made based on the priority of the business processes
detailed in the previous step.
Step 5: Defining the Project-level business requirements
Based on the previous steps, now the BI System developing team defines and documents the
project-level business requirements. These requirements act as guidelines while developing the
BI system.
(ii) Discuss the characteristics of a Data warehouse and analyze how the
development of a Data warehouse helps you in managing various functions in
your organizations.
Ans. According to Bill Inmon, who is considered to be the Father of Data warehousing, the data in a
Data Warehouse consists of the following characteristics:
Subject oriented
The first feature of DW is its orientation toward the major subjects of the organization instead of
applications. The subjects are categorized in such a way that the subject-wise collection of
information helps in decision-making. For example, the data in the data warehouse of an insurance
company can be organized as customer ID, customer name, premium, payment period, etc. rather
auto insurance, life insurance, fire insurance, etc.

Integrated
The data contained within the boundaries of the warehouse are integrated. This means that all
inconsistencies regarding naming convention and value representations need to be removed in a
data warehouse. For example, one of the applications of an organization might code gender as
‘m’ and ‘f’ and the other application might code the same functionality as ‘0′ and ‘1′. When the data
is moved from the operational environment to the data warehouse environment, this will result in
conflict.
Time variant
The data stored in a data warehouse is not the current data. The data is a time series data as the data
warehouse is a place where the data is accumulated periodically. This is in contrast to the data in an
operational system where the data in the databases are accurate as of the moment of access.
Non-volatility of the data
The data in the data warehouse is non-volatile which means the data is stored in a read-only format and
it does not change over a period of time. This is the reason the data in a data warehouse forms as a
single source for all decision system support processing.
Keeping the above characteristics in view, ‘data warehouse‘ can be defined as a subject-oriented,
integrated, non-volatile, time-variant collection of data designed to support the decision-making
requirements of an organization.
A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. Typically, a data warehouse contains historical data derived from
transaction
data including the data from various other data sources of an organization. The data in a data
warehouse has the following characteristics; subject-orientation, integration, non-volatility, and time-
variance in order to support the decision-making requirements of an organization.
2.(i) What do you understand by Data warehouse Meta Data? What is the use of
Metadata? How can you manage Metadata?
Ans. In simple terms, ‘metadata’ refers to “data about data.” It is the information that describes, or
Supplements the main data. For example, metadata of a digital camera includes the settings used for
the picture, such as exposure value or flash intensity. Here, metadata acts as an additional information,
and is not critical to the functions of the main data. In other cases, such as a Zip disk, metadata might
provide the information regarding the write-protected status of the disk. In such a case, metadata is
essential for proper functioning of the main product. So the value of metadata depends on the context
that it is provided, and the ways that contextual information can be used. When data is made available,
the potential user (human or computer) must put the data into an existing model of knowledge, and may
ask questions to do so. For example, in the case of an image, metadata provides answers to many of
the questions like “When was the image taken?” and “Who are in the image?” In sophisticated data
systems, the metadata includes the contextual information surrounding the data and will also be very
sophisticated, capable of answering many questions that help understand the data. To sum up,
metadata can be defined as “the structured, encoded data that describe characteristics of
information-bearing entities to aid in the identification, discovery, assessment, and management of the
described entities.”
Use of Metadata
The applications of metadata are discussed below.
a.Metadata provides additional information to users of the data it describes and the information can
either be descriptive or algorithmic
b. Metadata speeds up and enriches searching for resources. Search queries using metadata saves users
from performing more complex filter operations manually. Also, web browsers, P2P applications and
media management software automatically downloads and locally caches metadata to improve the speed
at which files can be accessed and searched.

c.Metadata plays a vital role on the World Wide Web to find useful information from the large amount of
information available.
d.Metadata is an important part of electronic discovery. Application and file system metadata derived from
electronic documents and files acts as important evidence.
e. Some metadata is intended to enable variable content presentation. For example, if a picture has metadata
that indicates the most important region, the user can narrow the picture to that region and thus obtain the
details required.
f. Metadata can also be used to automate workflows. For example, if a software tool knows content and
structure of data, it can convert it automatically and pass it to another tool as input so that users need not
perform copy-and-paste operations required.
g. Metadata helps to bridge the semantic gap by explaining how a computer data items are related and how
these relations can be evaluated automatically. For example, if a search engine understands that
“Aditya Kaashyap” was a “Indian Engineer”, it can answer a search query on “Indian Engineers” with
a link to a web page about Aditya Kaashyap, although the exact words “Indian Engineers” never occur
on that page. This approach (called, knowledge representation) is of special interest to the semantic web
and artificial intelligence.
Managing of the Metadata
To successfully develop and use metadata, you need to understand the following important issues
that should be treated with care:
a. You need to keep track of the entire metadata created even in the early phases of planning and designing.
It is not economical to start attaching metadata once the production process has been completed.
b. Metadata must adapt if the resource it describes changes. It should be merged when two resources are
merged.
c. It can be useful to keep metadata even after the resource it describes has been removed.
d.Metadata can be stored either internally (in the same file as the data) or externally (in a separate file).
Internal storage allows transferring metadata together with the data it describes. Thus metadata is at
hand and can be easily manipulated. This method creates high redundancy and does not allow holding
metadata together. External storage allows bundling metadata, for example in a database, for more
efficient searching. There is no redundancy and metadata can be transferred simultaneously when
using streaming.
e.Storing the metadata in a human-readable format (such as XML) can be useful because users can
understand and edit it without specialized tools. But these formats are not optimized for storage
capacity. It may be useful to store metadata in a binary, non-human-readable format instead to speed up
transfer and save memory.
Although the majority of the computer professionals see metadata as a chance for better interoperability,
there are some demerits as detailed below:
a. Metadata is expensive and time-consuming. Also, it is very much complicated.
b. Metadata is subjective and depends on context. Two persons will attach different metadata to the same
resource due to their different points of view. Moreover, metadata can be misinterpreted due to its
dependency on context.
c.There is no end to metadata. For example, when annotating a soccer match with metadata, one can
describe only the players and their actions. Others can also describe the advertisements in the
background and the clothes the players wear. So even for a simple resource the amount of possible
metadata can be gigantic.
d. There is no real need for metadata as most of today’s search engines allow finding text very efficiently.
(ii) What do you understand by ETL? What are the significances of ETL processes?
What are the ETL requirements and steps?
Ans. Mostly the information contained in a data warehouse comes from the operational systems. But we all
know that the operational systems could not be used to provide the strategic information. So you need
to carefully understand what constitutes the difference between the data in the source operational systems
and the information in the data warehouse. It is all ETL functions that reshape the relevant data from the
source systems into useful information to be stored in the data warehouse. There would be no strategic
information in a data warehouse in the absence of these functions.
Significance of ETL Processes
The ETL functions act as the back-end processes that cover the extraction of the data from the source
systems. Also, they include all the functions and procedures for changing the source data into the exact
formats and structures appropriate for storage in the data warehouse database. After the transformation
of the data, the processes include all processes that physically move the data into the data warehouse
repository. After capturing the information, you cannot dump the data into the data warehouse. You have
to carefully subject the extracted data to all manner of transformations so that the data will be fit to
convert into information.
Let us try to understand the significance of the ETL function by taking an example. For instance, you
want to analyze and compare sales according to stores, product and month. But the sales data is
available in various applications of your organization. Therefore, you have to have the entire sales
details in the data warehouse database. You can do this by providing the sales and price in a fact table,
the products in a product dimension table, the stores in a stores dimension table and months in a
time dimension table. To do this, you need to extract the data from the respective source systems,
reconcile the variations in the data representations among the source systems, transform the entire sales
details, and load the sales into fact and dimension table. Thus the execution of ETL functions is
challenging because of the nature of the source systems.
Also, the amount of time to be spent on performing the ETL functions is as much as 50-70% of the total
effort to be put for building a data warehouse. To extract the data, you have to know the time window
during each day to extract the data from a specific source system without impacting the usage of that
system. Also, you need to determine the mechanism for capturing the changes in the data in each of
the relevant systems. Apart from the ETL functions, the building of a data warehouse includes the
functions like data integration, data summarization and metadata updating. Figure 6.1 details the
processes involved to execute the ETL functions in building a data warehouse.

ETL Requirements and Steps
Ideally, you are required to undergo the following steps provided in
3.(i) Describe briefly the Data Transformation process. What are the major types of
transformations? Describe them briefly.
Ans. You need to perform various types of transformation tasks before moving the extracted data from the
source systems into the data warehouse. The transformation of the data is to be done as per the
standards as the data comes from various source systems and you also need to ensure that the
combined data does not violate the business rules.
Irrespective of the complexity of the source systems,and regardless of the extent of your data warehouse
Major Transformation Types
By undertaking a combination of the basic tasks discussed above, you can do the following transformation
functions:
Format Revisions
Format revisions include changes to the data types and lengths of individual fields. For instance,
product package types in your source systems may be indicated by codes and names in which the fields
are numeric and text data types. Also, the lengths of package types might vary from one source
system to another. Therefore, you can standardize and change the data type to text in order to
provide values meaningful to the users using format revisions.
Decoding of Fields
This type of transformation deals with multiple source systems and you are bound to have same data
items described by a plethora of field values. For instance, the coding for two products manufactured
by an organization might have been done as 1 and 2 in one source system and is done as A and B
in another system. In such situations, you need to decode the codes and standardize the code before
loading the data into a data warehouse; otherwise there would be a conflict in the data analysis.
Calculated and Derived values
You can maintain both calculated and derived types of data values in a typical data warehouse.
For instance, you can keep ‘profit margin’ (this can be calculated as the difference between the total
sales and total cost) as a calculated value along with sales and cost amounts after extracting the data
from the sales system viz., sales volume, sales value, operating cost estimates. Similarly, you may use
average daily balances and operating ratios as derived fields.
Splitting of Single Fields
You need to split the larger single files for improved understanding and making better analysis.
For instance, the traditional legacy systems store name and address of customers in a large text files.
Similarly, some systems store city, state, and zip code data together in a single file.But these components
need to be stored individually to improve the operation performance by indexing on individual
components and to perform analysis by using individual components such as city, state, and zip code.
Merging of Information
This type of transformation deals with merging of information available in various source systems into
a single entity. For instance, the product code and description may come from one data source and
the relevant package types, the cost data may come from several other source systems. Here, merging of
information denotes combining the product code, description, package types, and cost into a single entity.
Summing Up
In this type of transformation, the summaries are created and then loaded in the data warehouse instead
of loading the most granular level of data. For instance, a credit card company need not store each and
every single transaction on each credit card in the data warehouse to analyze sales patterns. Instead, the
data can be summarized to the extent possible and store the summary data instead of the most granular
data.
Character Set Conversion
In this type of data transformation, the character sets are converted into an agreed standard character
set for textual data in the data warehouse. For instance, the source data will be in EBCDIC
(Extended Binary Coded Decimal Interchange Code) characters if you have mainframe legacy systems
as source systems. So you need to convert from mainframe EBCDIC format to the ASCII
(American Standard Code for Information Interchange), format if PC-based architecture is the choice of
your data warehouse.
Conversion of Units of Measurements
Use of standard unit of measurement is one of the prerequisites in building a data warehouse. If your
company has overseas operations, you may have to convert the metrics accordingly so that the numbers
may all be in one standard unit of measurement.
Here, the date/time conversion is an important measurement. For example, the date of October 9, 2006
is written as 10/09/2006 in the U.S format and as 09/10/2006 in the British format. This can be standardized
by writing it as 09 Oct 2006.
Key Restructuring
You have to come up with keys for the fact and dimension tables for a data warehouse to be built based on
the keys in the extracted records. So you look at the primary keys of the extracted records while
extracting data from the input sources. For instance, the product code in an organization is structured
to have an inherent meaning (like first letter describes the location code, second letter describes the
machine code, etc.) and you use this product code as the primary key and move the data into another
warehouse. Then the warehouse part of the product key will have to be changed before moving the data.
Therefore, avoid the keys with built-in meanings while choosing keys for your data warehouse
database tables and transform such keys into generic keys (that are generated by the system itself).
Reduplication
Some companies may maintain several records for a single customer and so duplicates are the result of
the additional records. Therefore, it is suggested to keep a single record for one customer and link all the
duplicates in the source systems to this single record in your data warehouse. This process is called
reduplication.
(ii) What do you understand by EIS? What are the significances of EIS?
Briefly describe the benefits of EIS.
Ans. Definition of an EIS
In simple terms, an EIS can be defined as a computer-based system intended to facilitate and support
the information and decision making needs of senior executives of an enterprise by providing easy
access to both internal and external information relevant to meeting the strategic goals of the organization.
These systems act as organizational-wide Decision Support Systems to help top-level executives analyze,
compare, and highlight the trends and patterns of the important variables. Also, these systems emphasize
on graphical displays, easy-to-use user interfaces and offer strong reporting capabilities.
Significance of EIS
An EIS provides the summarized or detailed data of the strategic information at the convenience of the senior
executives of an organization. An EIS performs all these functions by constantly monitoring the internal and
external events and trends. For instance, an executive can use EIS to view the sales functioning categorized
by product, region, month, etc. Similarly, the executive can also monitor the sales performance of the
organization’s competitors. Based on the snapshot provided by the EIS, the executive can drill down into
the organization’s data warehouse to display greater level of details and to explore the current and past data
patterns and trends. This process can be continued till the executive reaches a single transaction level and
thus EIS provides the executive with the information that explains the variance and helps in deciding a course
of action.
The tools offered by EIS are programmed to provide canned reports or briefing books to top-level executives.
Today these tools allow ad-hoc querying against a multi-dimensional database, and most offer analytical
applications along functional lines such as sales or financial analysis. But an organizational EIS cannot
become a substitute for other forms of information technologies and computer-based systems viz.,
Management Information Systems (MIS), Transaction Processing Systems (TPS), and Decision Support
Systems (DSS).
Today, the application of an EIS is not only in typical corporate hierarchies, but also at personal computers
on a local area network. These systems now cross computer hardware platforms and integrate information
stored on mainframes, personal computer systems, and minicomputers. As some client service companies
adopt the latest enterprise information systems, executives can use their personal computers to get access
to the company’s data and decide which data are relevant for their decision making. This arrangement enables
all users to customize their access to the proper company’s data and provide relevant information to both
upper and lower levels in companies.
Benefits of an EIS
The advantages that an EIS brings to the organization are:
a. Provides tools to select, extract, filter, and track the critical information of organization in an organized manner
b. Enables the top-level executives to use the system with ease (extensive computer experience is not required)
c. Provides timely delivery of the organization-wide summary of information highlighting the major deviations of the
information wherever they arise
e. Provides a wide range of reports including the status reports, trend analyses, drill down investigation, ad hoc
queries.
f. Presents the information in graphical, tabular, and/or text formats
The organizational EIS is not a substitute for other information technologies and computer-based systems.
The other decision support systems are still vital in bringing relevant information to the various levels of
a modern organization. The EIS feeds off the various information systems within an organization for its
internal information needs and then attaches itself to the external sources as and when necessary to
provide a macro view of business.
However, executives face the following limitations with the Executive Information Systems:
a. Cost of establishing an EIS is relatively high and so may not be economically viable for small companies
b. Functions are limited and so the systems may not perform complex calculations
c. Depends on the other information technologies in the organization to gather the organization’s internal data
d. Difficult to keep the current data as it focuses on historical data

Master of Business Administration-MBA Semester III
MI0027
Business Intelligence and tools - 2 Credits
Assignment Set- 2
1. Describe Data Quality Types and discuss the concepts of TQDM.

Ans. One of the main reasons for the failure of a data warehouse deployment is the poor quality of the data
loaded into a warehouse. So the managers need to be careful and take up the precautions required to
ensure that the quality of data loaded into the warehouse is appropriate.
Data Quality Types
There are two significant dimensions in understanding the quality of the data; intrinsic quality and realistic
quality. Here, the ‘intrinsic data quality’ is the correctness or accuracy of data and ‘realistic data quality’
is the value that the correct data has in supporting the work of the business or organization
To state simply, the ‘intrinsic data quality’ is the accuracy of the data. It is the degree to which data
accurately reflects the real-world object that the data represents. If all facts that an organization needs
to know about an entity are accurate, then that data has intrinsic quality.
Data that does not enable the organization to accomplish its mission has no quality, no matter how
accurate it is. Thus ‘realistic data quality’ comes into the picture. Realistic data quality is the degree of
utility and value the data has to support the organizational processes to accomplish the organizational
objectives. Fundamentally, realistic data quality is the degree of customer satisfaction that the knowledge
workers derive out of the use of the data.
Concept of TQDM
Many of the business intelligence projects do not deliver to full potential because of one reason that people
tend to see data quality as a one-time undertaking as a part of user acceptance testing (UAT). But it is very
important that data quality management is to be undertaken as a continuous improvement process. You
have to use an iterative approach as detailed below to achieve the data quality:
Step 1: To establish Data Quality Management environment
Undertaking a commitment to the Data Quality Management process can be accomplished by establishing
the data quality management environment between information system managers and establishing the
conditions to encourage coordination between functional and information system development professionals.
Functional users of legacy information systems know data quality problems of the existing systems but hardly
know how to improve the quality of the existing data systematically. But the Information system developers
know how to identify data quality problems, but hardly know how to change the functional requirements
that drive the systematic improvement of data. Given the existing barriers to communication, establishing
the data quality environment requires participation of both functional users and information system
administrators.
Step 2: To draft the Project scope
For each data quality analysis project selected, you may have to draft an initial plan that addresses the following items:
 Task Summary
 Task Description
 Project Approach
 Schedule
 Resources
Step 3: To implement the Data Quality Projects
A data quality project consists of four activities;
a. Define
b. Measure
c. Analyze
d. Improve
The data quality project manager performs these activities with input from the functional users of the
data, system developers, and database administrators of the legacy and target database systems.
Step 4: To evaluate Data Quality Management Methods

The objective of this step is to evaluate and assess the progress made in implementing data quality initiatives.
All stakeholders in the Data Quality Management process (functional users, program managers, developers,
and the Office of Data Management) are required to review the progress to determine whether data quality
projects have helped to achieve goals and benefits.
2. Discuss about the Visual Warehouse in details.

Ans. Visual Warehouse is an integrated product for building and maintaining a data warehouse or data mart
in a LAN environment. The visual warehouse integrates many of the business intelligence component
functions into a single product and it can be used to automate the process of bringing the data together
from heterogeneous sources into a central, integrated, information providing environment. It does not simply
create a data warehouse or an information database; but provides the processes to define, build, manage,
monitor and maintain an environment which provides information. The visual warehouse can be managed either
centrally or from the workgroup environment. Therefore, business groups can meet their own information
needs without burdening information systems resources, and can enjoy the autonomy of their own data mart
without compromising overall data integrity and security in the enterprise.
Following are some of the important features of visual warehouses:
a. Visual Warehouse has the ability to extract and transform data from a wide range of heterogeneous data sources
(both internal and external sources of an enterprise); such as the DB2 family, Microsoft SQL Server, Oracle,
Sybase, Informix, and flat files (for example, from spreadsheets). On the basis of the metadata defined by the
administrative component of visual warehouse, the data from any of these sources can be extracted and
transformed. Also, the extraction process, which supports full refreshing of data, can run on demand or on an
automated scheduled basis.
b.The transformed data can be placed in a data warehouse built on any of the DB2 UDB platforms (including DB2
for Windows NT, DB2 for AIX, DB2 for HP-UX, DB2 for Sun Solaris, DB2 for SCO, DB2 for SINIX, DB2 for OS/2,
DB2 for OS/400, and DB2 for OS/390) or on flat files. The visual warehouse provides the flexibility and scalability
to populate any combination of the supported databases. Also, visual warehouse supports Oracle, Sybase,
Informix, and Microsoft SQL Server using IBM DataJoiner.
c. Once the data is in the target data warehouse, the data can be accessible by a variety of end user query tools.
These tools can be from IBM, such as Lotus Application, or QMF for Windows, or from any other vendors whose
products comply with the DB2 Client Application Enabler (CAE) or the Open Database Connectivity (ODBC)
interface, such as Business Objects, and Cognos Impromptu. The data can also be browsed using any of the
popular web browsers with additional web-infrastructure components.

3. What do you understand by OLAP? What are the guidelines for an OLAP
system? Briefly discuss the concept of Data Mining.
Ans.OLAP is a category of software technology that enables the managers to gain insight into
data
through fast, consistent, interactive access in a wide variety of possible views of information
that has been transformed from raw data to reflect the real dimensions of the enterprise as
understood by the user. Thus the key elements of an OLAP system are speed, consistency,
interactive access and multiple dimensional views. To understand in simple terms, OLAP is
a
technical term for multi-dimensional analysis.
The guidelines for an OLAP system are as follows:

a. Multidimensional Conceptual View: The OLAP system has to provide a multidimensional data model that is
analytical and easy to use. It has to support ’slice-and-dice’ operations and is usually required in financial
modeling.
b. Transparency: These systems need to be part of an open system that supports heterogeneous data sources.
Also, the end-user need not necessarily be concerned about the details of data access or conversions.
c. Accessibility: The OLAP system should present the user with a single logical schema of the data. It has to map
its own logical schema to the heterogeneous physical data stores and perform any necessary transformations.
d. Consistent Reporting Performance: The users of the system should not experience any significant degradation
in reporting performance as the number of dimensions or the size of the database increases. Users need to
perceive consistent run time, response time, or machine utilization every time a given query is run.
e.Client/Server Architecture: The system has to have conformance to the principles of client/server architecture
for optimum performance, flexibility, adaptability, and interoperability. Also, the server component needs to be
sufficiently intelligent to enable various clients to be attached with minimum effort.
f. Generic Dimensionality: The system has to ensure that every data dimension is equivalent in both structure
and operational capabilities. We should be able to apply the function of one dimension to another too.
g. Dynamic Sparse Matrix Handling: This guideline is related to the idea of nulls in relational databases and to
the notion of compressing large files, and a sparse matrix is one in which not every cell contains data. So the
OLAP systems should accommodate varying storage and data-handling options.
h. Multi-user Support: Similar to EIS systems, the OLAP systems need to support multiple concurrent users,
including their individual views and/or slices of a common database.
i. Unrestricted Cross-dimensional operations: The OLAP system should have the ability to recognize
dimensional hierarchies and automatically perform roll-up and drill-down operations within a dimension or across
dimensions.
j. Intuitive data manipulation: The system should enable consolidation path reorientation (pivoting), drill-down
and roll-up, and other manipulations to be accomplished intuitively and directly via point-and-click and
drag-and-drop actions.
k.Flexible Reporting: The system should enable its users arrange columns, rows, and cells in a manner that
facilitates easy manipulation, analysis, and synthesis of information.
l. Unlimited Dimensions and Aggregation Levels: The system is expected to accommodate at least fifteen
(preferably twenty) data dimensions within a common analytical model.
Later in 1995, Codd included the following six requirements in addition to the above twelve basic guidelines:
a. Drill-through to Detail level: The system has to allow a smooth transition from the multidimensional,
pre-aggregated database to the detail record level of the source data warehouse repository.
b.Treatment of Non-normalized Data: The system should prohibit calculations made within it from getting
affected by the external data serving as the source.

C.Storing OLAP Results: The OLAP system should not deploy write-capable OLAP tools on top of transactional
systems.
Missing Values: The system should be able to ignore the missing values, irrespective of their source.
Incremental Database Refresh: The system has to provide for incremental refreshes of the extracted and
aggregated OLAP data.
SQL Interface: The OLAP system should have the ability to get integrated into the existing enterprise
environment.
Concept of Data Mining
By its simplest definition, data mining (DM) is the set of activities used to find new, hidden, or unexpected
patterns in data. It is the process of analyzing data from different perspectives and summarizing it into
useful information. Technically, the data mining process finds the correlations and patterns existing among
several fields in a large relational database.
In the past, decision support activities were based on the concept of verification. In this sense, a relational
database could be queried to provide dynamic answers to well-formed questions. The key issue in verification
is that it requires a great deal of prior knowledge on the part of the decision maker in order to verify a
suspected relationship through the query. In the 1990s, data warehouses with query and report tools assisted
the users in retrieving the types of decision support information they needed. Later OLAP tools came in to
existence for better sophisticated analysis.
Till this point, the approach for obtaining the information was mainly driven by the users. But the sheer
volume of data renders it impossible for anyone to use analysis and query tools to discern useful patterns.
For instance, in marketing research analysis, it is practically impossible to go through all the possible
associations and gain insights by querying and drilling down into the data warehouse. You might really
need a technology that can learn from past associations and results, and predict future behavior of
customers.
It is really good to have a tool that will accomplish the discovery of knowledge by itself to sustain
the cut-throat competition. Thus you require a data-driven approach rather than a user-driven one.
Using the information stored within a data warehouse, the data mining techniques can provide solutions to
the following questions:
1. Which type of products needs to be promoted for a specific individual customer?
2. Which scrips/securities are going to be more profitable during next trading session?
3. What is the probability for an individual customer to respond to a particular promotion?
4. What is the likelihood that an individual customer will default or payback on schedule?
These questions can be answered easily if the information hidden among the terabytes of data in your
databases can be located and utilized fully.
Another important DM technique is knowledge data discovery (KDD). Using a combination of techniques,
including statistical analysis, multidimensional analysis, intelligent agents, and data visualization, KDD
can discover highly useful informative patterns within the data that can be used to develop predictive models
of behavior.

Karrox Technologies LTD-: Andheri

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Karrox Technologies LTD-: Andheri

Hochgeladen von

Copyright:

Verfügbare Formate

Spring 2010 (Jan-June)

MBA – SEM – III

Name :Prashant D. Devale

Date of Submission at the Learning Centre: 30-4-

Business Intelligence and tools - 2 Credits

inventory and distribution analysis to name a few.

warehousing’ and ‘business intelligence’ are being used interchangeably.

Step 1: Ensuing strong business partnership

will bring about.

to come out as a major information system.

Step 2: Defining organizational-level business requirements

The long-term goal of a BI system is to build an organizational-wide information infrastructure.

Delivery of Business Value through a BI System

includes the following steps:

2.Interviewing the BI System stakeholders

3.Gathering the organizational level business requirements

4.Preparation of an overall Requirements document

Step 3: Prioritizing the business requirements

a prioritization process is a list of business processes in the priority order.

Step 4: Planning the Business Intelligence project

detailed in the previous step.

Step 5: Defining the Project-level business requirements

development of a Data warehouse helps you in managing various functions in

auto insurance, life insurance, fire insurance, etc.

inconsistencies regarding naming convention and value representations need to be removed in a

Non-volatility of the data

single source for all decision system support processing.

integrated, non-volatile, time-variant collection of data designed to support the decision-making

variance in order to support the decision-making requirements of an organization.

The applications of metadata are discussed below.

either be descriptive or algorithmic

at which files can be accessed and searched.

electronic documents and files acts as important evidence.

perform copy-and-paste operations required.

Managing of the Metadata

that should be treated with care:

transfer and save memory.

there are some demerits as detailed below:

a. Metadata is expensive and time-consuming. Also, it is very much complicated.

metadata can be gigantic.

information in a data warehouse in the absence of these functions.

Significance of ETL Processes

convert into information.

challenging because of the nature of the source systems.

processes involved to execute the ETL functions in building a data warehouse.

Ideally, you are required to undergo the following steps provided in

combined data does not violate the business rules.

Major Transformation Types

provide values meaningful to the users using format revisions.

Calculated and Derived values

average daily balances and operating ratios as derived fields.

Splitting of Single Fields

need to be stored individually to improve the operation performance by indexing on individual

Character Set Conversion

your data warehouse.

Conversion of Units of Measurements

may all be in one standard unit of measurement.

by writing it as 09 Oct 2006.

The advantages that an EIS brings to the organization are:

information wherever they arise

f. Presents the information in graphical, tabular, and/or text formats

provide a macro view of business.