Sie sind auf Seite 1von 13

IT FOR MANAGERS

An assignment on what are Data Warehousing, Meta data, Data


Mining, and the uses.
PGDM – IB (2009-11)
(Term – II)

IT For Managers Page 1


Submitted to: Prof. Rupesh
Kumar Sinha

Contents Submitted by: Group no 5


Ankita (05), Deepak (17), Manas (29), Ranjan (41),
Sourav (53)
History of Data Warehousing.................................................................................3
Key developments in early years of data warehousing were:............................4
What is Data Warehousing?...................................................................................4
What is Metadata?.................................................................................................5
What is Data Mining?.............................................................................................7
How does data mining work?..............................................................................8
How are these useful?.........................................................................................10
How data warehousing is useful?.....................................................................10
How metadata is useful?..................................................................................10
How data mining useful?..................................................................................11
How data mining useful?

IT For Managers Page 2


History of Data Warehousing
Data warehouse is a repository of an organization's electronically
stored data. Data warehouses are designed to facilitate reporting and
analysis. A data warehouse houses a standardized, consistent, clean and
integrated form of data sourced from various operational systems in use
in the organization, structured in a way to specifically address the
reporting and analytic requirements.
Since the early 1990s, data warehouses have been at the forefront
of information technology applications as a way for organizations to
effectively use digital information for business planning and decision
making. As information professionals, we no doubt will encounter the data
warehouse phenomenon if we have not already been exposed to it in our
work. Hence, an understanding of data warehouse system architecture is
or will be important in our roles and responsibilities in information
management.
When IBM researchers Barry Devlin and Paul Murphy developed the
"business data warehouse." In essence, the data warehousing concept
was intended to provide an architectural model for the flow of data from
operational systems to decision support environments. The concept
attempted to address the various problems associated with this flow -
mainly, the high costs associated with it. In the absence of a data
warehousing architecture, an enormous amount of redundancy was
required to support multiple decision support environments. In larger
corporations it was typical for multiple decision support environments to
operate independently.

IT For Managers Page 3


As technology improved (lower cost for more performance) and user
requirements increased (faster data load cycle times and more features),
data warehouses have evolved through several fundamental stages:
• Offline Operational Databases - Data warehouses in this initial stage
are developed by simply copying the database of an operational
system to an off-line server where the processing load of reporting
does not impact on the operational system's performance.
• Offline Data Warehouse - Data warehouses in this stage of evolution
are updated on a regular time cycle (usually daily, weekly or
monthly) from the operational systems and the data is stored in an
integrated reporting-oriented data structure.
• Real Time Data Warehouse - Data warehouses at this stage are
updated on a transaction or event basis, every time an operational
system performs a transaction (e.g. an order or a delivery or a
booking etc.)
Integrated Data Warehouse - Data warehouses at this stage are used
to generate activity or transactions that are passed back into the
operational systems for use in the daily activity of the organization.

Key developments in early years of data warehousing


were:
• 1960s - General Mills and Dartmouth College, in a joint research
project, develop the terms dimensions and facts.
• 1970s - ACNielsen and IRI provide dimensional data marts for
retail sales.
• 1983 - Teradata introduces a database management system
specifically designed for decision support.
• 1988 - Barry Devlin and Paul Murphy publish the article
architecture for a business and information systems in IBM
Systems Journal where they introduce the term "business data
warehouse".
• 1990 - Red Brick Systems introduces Red Brick Warehouse, a
database management system specifically for data warehousing.
• 1991 - Prism Solutions introduces Prism Warehouse Manager,
software for developing a data warehouse.
• 1991 - Bill Inmon publishes the book Building the Data
Warehouse.

IT For Managers Page 4


• 1995 - The Data Warehousing Institute, a for-profit organization
that promotes data warehousing, is founded.
• 1996 - Ralph Kimball publishes the book The Data Warehouse
Toolkit.
• 1997 - Oracle 8, with support for star queries, is released.

What is Data Warehousing?


Data warehousing is commonly used by companies to
analyze trends over time. In other words, companies may very well use
data warehousing to view day-to-day operations, but its primary function
is facilitating strategic planning resulting from long-term data overviews.
From such overviews, business models, forecasts, and other reports and
projections can be made. Routinely, because the data stored in data
warehouses is intended to provide more overview-like reporting, the data
is read-only. If you want to update the data stored via data warehousing,
you'll need to build a new query when you're done.
This is not to say that data warehousing involves data that is never
updated. On the contrary, the data stored in data warehouses is updated
all the time. It's the reporting and the analysis that take more of a long-
term view.
Data warehousing is not the be-all and end-all for storing all of a
company's data. Rather, data warehousing is used to house the necessary
data for specific analysis. More comprehensive data storage requires
different capacities that are more static and less easily manipulated than
those used for data warehousing.
Data warehousing is typically used by larger companies analyzing
larger sets of data for enterprise purposes. Smaller companies wishing to
analyze just one subject, for example, usually access data marts, which
are much more specific and targeted in their storage and reporting. Data
warehousing often includes smaller amounts of data grouped into data
marts. In this way, a larger company might have at its disposal both data
warehousing and data marts, allowing users to choose the source and
functionality depending on current needs.

IT For Managers Page 5


A Diagrammatic representation of Data warehouse

What is Metadata?
Metadata is structured data which describes the characteristics of a
resource. It shares many similar characteristics to the cataloguing that
takes place in libraries, museums and archives. The term "meta" derives
from the Greek word denoting a nature of a higher order or more
fundamental kind. A metadata record consists of a number of pre-defined
elements representing specific attributes of a resource, and each element
can have one or more values. Below is an example of a simple metadata
record:

Element name Value

Title Student Web catalogue

Creator Ranjan

IT For Managers Page 6


Publisher IFIM Business School

Identifier http://www.ifimo911.weebly.com

Format Text/html

Relation Class Web site

Each metadata schema will usually have the following


characteristics:
A limited number of elements
The name of each element
The meaning of each element
Typically, the semantics is descriptive of the contents, location,
physical attributes, type (e.g. text or image, map or model) and form (e.g.
print copy, electronic file). Key metadata elements supporting access to
published documents include the originator of a work, its title, when and
where it was published and the subject areas it covers. Where the
information is issued in analog form, such as print material, additional
metadata is provided to assist in the location of the information, e.g. call
numbers used in libraries. The resource community may also define some
logical grouping of the elements or leave it to the encoding scheme. For
example, Dublin Core may provide the core to which extensions may be
added.
Some of the most popular metadata schemas include:
Dublin Core
AACR2 (Anglo-American Cataloguing Rules)
GILS (Government Information Locator Service)
EAD (Encoded Archives Description)
IMS (IMS Global Learning Consortium)
AGLS (Australian Government Locator Service)
While the syntax is not strictly part of the metadata schema, the
data will be unusable, unless the encoding scheme understands the
semantics of the metadata schema. The encoding allows the metadata to
be processed by a computer program. Important schemes include:
HTML (Hyper-Text Mark-up Language)
SGML (Standard Generalised Mark-up Language)

IT For Managers Page 7


XML (extensible Mark-up Language)
RDF (Resource Description Framework)
MARC (Machine Readable Cataloguing)
MIME (Multipurpose Internet Mail Extensions)
Metadata may be deployed in a number of ways:
Embedding the metadata in the Web page by the creator or their
agent using META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have
been directly created within the database or extracted from another
source, such as Web pages.
The simplest method is for Web page creators to add the metadata
as part of creating the page. Creating metadata directly in a database and
linking it to the resource, is growing in popularity as an independent
activity to the creation of the resources themselves. Increasingly, it is
being created by an agent or third party, particularly to develop subject-
based gateways.

What is Data Mining?


Generally, data mining (sometimes called data or knowledge
discovery) is the process of analyzing data from different perspectives
and summarizing it into useful information - information that can be used
to increase revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to analyze
data from many different dimensions or angles, categorize it, and
summarize the relationships identified. Technically, data mining is the
process of finding correlations or patterns among dozens of fields in large
relational databases.
Data mining is primarily used today by companies with a strong
consumer focus - retail, financial, communication, and marketing
organizations. It enables these companies to determine relationships
among "internal" factors such as price, product positioning, or staff skills,
and "external" factors such as economic indicators, competition, and
customer demographics. And, it enables them to determine the impact on
sales, customer satisfaction, and corporate profits. Finally, it enables them
to "drill down" into summary information to view detail transactional data.

IT For Managers Page 8


With data mining, a retailer could use point-of-sale records of
customer purchases to send targeted promotions based on an individual's
purchase history. By mining demographic data from comment or warranty
cards, the retailer could develop products and promotions to appeal to
specific customer segments.
For example, Blockbuster Entertainment mines its video rental
history database to recommend rentals to individual customers. American
Express can suggest products to its cardholders based on analysis of their
monthly expenditures.
Wal-Mart is pioneering massive data mining to transform its supplier
relationships. Wal-Mart captures point-of-sale transactions from over
2,900 stores in 6 countries and continuously transmits this data to its
massive 7.5 terabyte Teradata data warehouse. Wal-Mart allows more
than 3,500 suppliers, to access data on their products and performs data
analyses. These suppliers use this data to identify customer buying
patterns at the store display level. They use this information to manage
local store inventory and identify new merchandising opportunities. In
1995, Wal-Mart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining
application that can be used in conjunction with image recordings of
basketball games. The Advanced Scout software analyzes the movements
of players to help coaches orchestrate plays and strategies. For example,
an analysis of the play-by-play sheet of the game played between the
New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals
that when Mark Price played the Guard position, John Williams attempted
four jump shots and made each one! Advanced Scout not only finds this
pattern, but explains that it is interesting because it differs considerably
from the average shooting percentage of 49.30% for the Cavaliers during
that game.
By using the NBA universal clock, a coach can automatically bring
up the video clips showing each of the jump shots attempted by Williams
with Price on the floor, without needing to comb through hours of video
footage. Those clips show a very successful pick-and-roll play in which
Price draws the Knicks defence and then finds Williams for an open jump
shot.
How does data mining work?
While large-scale information technology has been evolving
separate transaction and analytical systems, data mining provides the link
between the two. Data mining software analyzes relationships and
patterns in stored transaction data based on open-ended user queries.
Several types of analytical software are available: statistical, machine

IT For Managers Page 9


learning, and neural networks. Generally, any of four types of
relationships are sought:
• Classes: Stored data is used to locate data in predetermined
groups. For example, a restaurant chain could mine customer
purchase data to determine when customers visit and what they
typically order. This information could be used to increase traffic by
having daily specials.
• Clusters: Data items are grouped according to logical relationships
or consumer preferences. For example, data can be mined to
identify market segments or consumer affinities.
• Associations: Data can be mined to identify associations. The beer-
diaper example is an example of associative mining.
• Sequential patterns: Data is mined to anticipate behaviour
patterns and trends. For example, an outdoor equipment retailer
could predict the likelihood of a backpack being purchased based on
a consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
• Extract, transform, and load transaction data onto the data
warehouse system.
• Store and manage the data in a multidimensional database system.
• Provide data access to business analysts and information
technology professionals.
• Analyze the data by application software.
• Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
• Artificial neural networks: Non-linear predictive models that
learn through training and resemble biological neural networks in
structure.
• Genetic algorithms: Optimization techniques that use process
such as genetic combination, mutation, and natural selection in a
design based on the concepts of natural evolution.
• Decision trees: Tree-shaped structures that represent sets of
decisions. These decisions generate rules for the classification of a
dataset. Specific decision tree methods include Classification and
Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID). CART and CHAID are decision tree techniques
used for classification of a dataset. They provide a set of rules that
you can apply to a new (unclassified) dataset to predict which
records will have a given outcome. CART segments a dataset by
creating 2-way splits while CHAID segments using chi square tests

IT For Managers Page 10


to create multi-way splits. CART typically requires less data
preparation than CHAID.
• Nearest neighbour method: A technique that classifies each
record in a dataset based on a combination of the classes of
the k record(s) most similar to it in a historical dataset (where k 1).
Sometimes called the k-nearest neighbour technique.
• Rule induction: The extraction of useful if-then rules from data
based on statistical significance.
• Data visualization: The visual interpretation of complex
relationships in multidimensional data. Graphics tools are used to
illustrate data relationships.

How are these useful?


How data warehousing is useful?
Some of the benefits that a data warehouse provides are as follows.
• A data warehouse provides a common data model for all data of
interest regardless of the data's source. This makes it easier to
report and analyze information than it would be if multiple data
models were used to retrieve information such as sales invoices,
order receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are
identified and resolved. This greatly simplifies reporting and
analysis.
• Information in the data warehouse is under the control of data
warehouse users so that, even if the source system data is purged
over time, the information in the warehouse can be stored safely for
extended periods of time.
• Because they are separate from operational systems, data
warehouses provide retrieval of data without slowing down
operational systems.
• Data warehouses can work in conjunction with and, hence, enhance
the value of operational business applications, notably customer
relationship management (CRM) systems.
• Data warehouses facilitate decision support system applications
such as trend reports (e.g., the items with the most sales in a
particular area within the last two years), exception reports, and
reports that show actual performance versus goals.
How metadata is useful?

IT For Managers Page 11


Metadata provide additional information to users of the data it
describes. This information may be descriptive ("These pictures were
taken by children in the school's third grade class.") or algorithmic
("Checksum=139F").
Metadata helps to bridge the semantic gap. By telling a computer
how data items are related and how these relations can be evaluated
automatically, it becomes possible to process even more complex filter
and search operations. For example, if a search engine understands that
"Van Gogh" was a "Dutch painter", it can answer a search query on
"Dutch painters" with a link to a web page about Vincent Van Gogh,
although the exact words "Dutch painters" never occur on that page. This
approach, called knowledge representation, is of special interest to the
semantic web and artificial intelligence.
Some metadata is intended to enable variable content presentation.
For example, if a picture has metadata that indicates the most important
region — the one where there is a person — an image viewer on a small
screen, such as on a mobile phones, can narrow the picture to that region
and thus show the user the most interesting details. A similar kind of
metadata is intended to allow blind people to access diagrams and
pictures, by converting them for special output devices or reading their
description using text-to-speech software.
Other descriptive metadata can be used to automate workflows. For
example, if a "smart" software tool knows content and structure of data, it
can convert it automatically and pass it to another "smart" tool as input.
As a result, users save the many copy-and-paste operations required
when analyzing data with "dumb" tools.
Metadata is becoming an increasingly important part of electronic
discovery. Application and file system metadata derived from electronic
documents and files can be important evidence. Recent changes to the
Federal Rules of Civil Procedure make metadata routinely discoverable as
part of civil litigation. Parties to litigation are required to maintain and
produce metadata as part of discovery, and spoliation of metadata can
lead to sanctions.
Metadata has become important on the World Wide Web because of
the need to find useful information from the mass of information available.
Manually-created metadata adds value because it ensures consistency. If
a web page about a certain topic contains a word or phrase, then all web
pages about that topic should contain that same word or phrase.
Metadata also ensures variety, so that if a topic goes by two names each
will be used. For example, an article about "sport utility vehicles" would
also be tagged "4 wheel drives", "4WDs" and "four wheel drives", as this is
how SUVs are known in some countries.

IT For Managers Page 12


Examples of metadata for an audio CD include the MusicBrainz
project and All Media Guide's All music. Similarly, MP3 files have metadata
tags in a format called ID3.
How data mining useful?
Every business, organization and government bodies are collecting
large amount of data for research and development. Such huge database
can make them to have the information on hand when required. But most
important is that it takes much time to find important information from
the data. "If you want to grow rapidly, you must take quick and accurate
decisions to grab timely available opportunities."
By applying the process of data mining, you can easily extract and
filter required information from data. It is a processing of refining data and
extracting important information. This process is mainly divided into 3
sections; pre-processing, mining and validation. In pre-processing, large
amount of relevant data are collected. The mining section includes data
classification, clustering, error correction and linking information. The last
but important is validate without which you cannot make trust on
information. In short, data mining is a process of converting data into
authentic information.
Let's have look on how data mining is useful to companies.
Fast and Feasible Decisions: To search information from huge
bundle of data require more time. It also irritates a person who is doing
such. With annoyed mind one cannot take accurate decisions that's for
sure. By having help of data mining, one can easily get information and
make fast decisions. It also helps to compare information with various
factors so the decisions become more reliable. Data mining is helpful in
every decision to make it quick and feasible.
Powerful Strategies: After data mining, information becomes precise
and easy to understand. While making strategies, one can easily analyze
information in various dimensions. This analysis helps to get real idea
about the strategy implementation. Management bodies can implement
powerful strategies effectively to expand business boundaries.
Competitive Advantage: Information is easily available and precise
so that one can compare it with competitors' information. It is very much
required that you must compare the data otherwise you will have to suffer
in business. After doing competitive analysis, one can make corrective
decisions to go ahead from competitors. This way company can gain
competitive advantage.

IT For Managers Page 13

Das könnte Ihnen auch gefallen