Operational Research-G3 r2

Data Mining
&
Constructio
n Industry
in India
Quantitative and
Operations
Research
Group 3
Arpit| Divya| Mushfique|
Nirmal| Shilpa
TABLE OF CONTENTS
1. Data mining
1.1 What is data mining

1.2 How does it work
2
2
1.2.1 Types of relationship
1.2.1 Major elements 6

1.2.1 Levels of analysis
1.3 Data mining process
1.4 Data mining issues 2

2. Construction Industry 1
2.1 Application of data mining in construction industry 2
2.1.1 In Building Life Cycle Modelling and BMS 6
2.1.2 To minimize occupational injuries in construction industry
2.1.3 In cost management 6
2.1.4 In asset management 6
3. Conclusion
4. References
DATA MINING AND CONSTRUCTION INDUSTRY IN INDIA. GIVE EXAMPLES FOR DATA
TYPOLOGIES AND ANALYTICS FOR INDUSTRY NEEDS
1.1 What is data mining?

Advances in data generation and collection are producing data sets of massive
size in commerce and also in various scientific disciplines. The ease with which
data can now be gathered and stored has created a new attitude towards
data analysis: gather whatever data you can, whenever and wherever possible.
It has become an article of faith that the gathered data will have value, either
for the purpose that initially motivated its collection or for purpose not yet
envisioned.
Data mining means using exploration, analysis and discovering meaningful
patterns and rules from large amount of data. Data mining is generally used for
prediction and description.
1.2 How does data mining work?
While
large-scale
information
technology
has
been
evolving
separate
transaction and analytical systems, data mining provides the link between the
two. Data mining software analyzes the relationships and patterns in stored
transaction data, based on open-ended user queries. Several types of analytical
software are available: statistical, machine learning, and neural networks.
Generally, any of four types of relationships are sought:
Classes: Stored data is used to locate data in predetermined groups. For

example, a restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order. This
information could be used to increase traffic by having daily specials.
Clusters: Data items are grouped according to logical relationships or

consumer preferences. For example, data can be mined to identify
market segments or consumer affinities.
Associations: Data can be mined to identify associations. The beer-diaper

example is an example of associative mining.
Sequential patterns: Data is mined to anticipate behavior patterns and

trends. For example, an outdoor equipment retailer could predict the
likelihood of a backpack being purchased based on a consumer's
purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
Extract, transform, and load transaction data onto the data warehouse
system.
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology

professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
Artificial neural networks: Non-linear predictive models that learn through

training and resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such as

genetic combination, mutation, and natural selection in a design based
on the concepts of natural evolution.
Decision trees: Tree-shaped structures that represent sets of decisions.

These decisions generate rules for the classification of a dataset. Specific
decision tree methods include Classification and Regression Trees (CART)
and Chi Square Automatic Interaction Detection (CHAID) . CART and
CHAID are decision tree techniques used for classification of a dataset.
They provide a set of rules that you can apply to a new (unclassified)
dataset to predict which records will have a given outcome. CART

segments a dataset by creating 2-way splits while CHAID segments using
chi square tests to create multi-way splits. CART typically requires less data
preparation than CHAID.
Nearest neighbor method: A technique that classifies each record in a

dataset based on a combination of the classes of the k record(s) most
similar to it in a historical dataset (where k 1). Sometimes called the knearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on
statistical significance.
Data visualization: The visual interpretation of complex relationships in

multidimensional data. Graphics tools are used to illustrate data
relationships.
1.3 Data mining process

Figure illustrates the phases, and the iterative nature, of a data mining project.
The process flow shows that a data mining project does not stop when a
particular solution is deployed. The results of data mining trigger new business
questions, which in turn can be used to develop more focused models.
Figure 1: The Data Mining Process
1.4 Data Mining Issues

As data mining initiatives continue to evolve, there are several issues that
include, but are not limited to, data quality, interoperability, mission creep, and
privacy.
2. Construction Industry
The construction industry is responsible for undertaking some of the biggest and
most expensive projects on Earth. Its an industry where 35% of costs are
accounted for by material waste and remedial work. So efficient resource
management could be the difference between delivering on budget and
bankrupting an organization (or several organizations) in the industry.
Huge amounts of resources and work go into major construction projects and of
course this means that huge volumes of data are generated. This data can
come from people, computers, machines, sensors, and any other datagenerating device or agent. That, naturally enough, is what makes it big.It is also
constantly increasing with additional input from sources as diverse as on-site
workers, cranes, earth movers, material supply chains, and even buildings
themselves.
2.1 Application of data mining in construction industry
Traditional information systems are good at recording basic information about
project schedules, CAD designs, costs, invoices, and employee details. However,
they are limited in their ability to work with unstructured data like free text,
printed information or analog sensor readings. Often, they can only handle
orderly digital rows and columns of numbers.
The idea of harnessing big data is to gain more insights and make better
decisions in construction management by not only accessing significantly more
data, but by properly analyzing it to draw practical building project conclusions.
In fact, big data, like truckloads of bricks or bags of cement, isnt useful on its
own until processed to formulate decision driving results.
2.1.1 Its application In Building Life Cycle Modelling and BMS
The rich set of building data generated or accumulated during the design and
documentation phases of buildings remains relevant even after the construction.
This data becomes richer as operations and maintenance data is included and
updated
regularly.
Architects,
Interior
designers,
engineers,
contractors,
marketing and sales personnel, building managers and owners can extract
useful information from databases for building renovation, maintenance and
operations. The Figure illustrates a proposed model of the information flow in
building design and maintenance.
Figure 2: Proposed model of an information flow
Data mining techniques can be used effectively on data stored in Building

Maintenance System (BMS) by extracting useful knowledge that can be used for
future management and design decision making. Knowledge that simply
implicitly resides in BMS databases and corresponds to the above figure includes:
1. Components that frequently need maintenance and therefore needs to
be inspected carefully
2. Historical consequences of maintenance decisions that may inform future
decisions
3. Components of buildings that significantly determine maintenance cost
and therefore may inform future building designs, as well as refurbishment
of the building.
Other benefits include constructing predictive plans based on correlations

obtained from applying data mining techniques on the maintenance data sets
of buildings. For example, the role of potential correlations between seasons and
malfunction rates in guiding the allocation of maintenance resources.
2.1.2 Application to minimize occupational injuries in construction industry
Utilizing a database of accidentcases a potential cause and effect relationship
regarding serious occupational accidents in the industry can be established,
which could help forming a frameworkfor improving the safety practices and
training programs that are essential to protecting constructionworkers from
occasional or unexpected accidents.
A research by Ching-Wu Cheng, Sou-Sen Leu, Ying-Mei Cheng, Tsung-Chih Wu,
Chen-Chung Lin on the application of data mining techniques to explore factors
contributing to occupational injuries in Taiwans construction industry by using
such a database using the datamining method known as classification and
regression tree (CART). Utilizing a database of 1542 accidentcases during the
period
20002009,
the
study
established
potential
cause-and-effect
relationshipsregarding serious occupational accidents in the industry. The results

of this study show that the occurrencerules for falls and collapses in both public
and private project construction industries serve as keyfactors to predict the
occurrence of occupational injuries.
2.1.3 Applications in cost management
One of the main aims of any construction client is to procure a project within the
limits of a predefined budget. However, most construction projects routinely
overrun their cost estimates. Existing theories on construction cost overrun
suggest a number of causes ranging from technical difficulties, optimism bias,
managerial incompetence and strategic misrepresentation. However, much of
the budgetary decision-making process in the early stages of a project is carried
out in an environment of high uncertainty with little available information for
accurate estimation.
Dealing with construction cost overruns using data mining by Dominic D

Ahiaga-Dagbui and Simon D Smith Uses non-parametric bootstrapping and
ensemble modelling in artificial neural networks, developingfinal project costforecasting models with 1600 completed projects in this experimental
research.This helped to extract information embedded in data on completed
construction projects, in an attempt to address the problem of dearth of
information in the early stages of a project. 92% of the 100 validation predictions
were within 10% of the actual final cost of the project whiles 77% were within
5% of actual final cost.
Figure 3: Validation results (Standard models vs Bootstrapping)
2.1.4 Application in asset management

While ideas and imagination are powerful resources, a real-life application of big
data in commercial construction can also be inspiring. One such example
comes from Nick Savko & Sons, Inc., a construction company based in Ohio,
and offering earthmoving and road surfacing services. The company equipped
its machines, including a scraper and articulated dump trucks, with 36 global
locator devices, so that the machines could be monitored at a distance. The
initial installation of the first two devices was done for them by a dealer, and the
company then installed the other 34 devices by themselves.
The devices gathered information on machine cycle time, idle time, productivity,
and more. This information was then fed into an asset management software
program. Idle time and location analysis allowed managers to know if too many
or too few trucks were being used, or if an earthmover would be more gainfully
used elsewhere. The same information was also analyzed to generate
information
on
loads
carried,
cycle
times,
and
cycle distances.
Fuel
consumption could be compared with benchmark figures to see if operators on

site were using the machines efficiently, or if there were possible mechanical
problems to be fixed.
The big benefits to the company included increasing productivity enough to
finish the project a month ahead of schedule, and fixing potential problems
before they became real ones. Because the data gathered also showed the
company its real costs, it now uses this information to tune profitability and
become even more competitive for following projects.
3. Conclusion
By using data mining in construction project, it will improve the performance and
help the contractors to transform their large data sets into useful information for
business improvement. Data mining play an important role as to combine the
knowledge and existing information to make its forecasts of final output.
One of the major challenges of data mining is to identify a poor culture of data
warehousing in the construction industry. For instance, data mining requires large
data sets to transform into useful information. However, for most construction
firms, there is unavailable or low amount of useful and complete data to model
the construction processes. It is a limitation and potential pitfall must always be
clearly communicated to the end user and find ways to solve this problem.
References
http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac
e/datamining.htm
https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#DM
CON046
http://www.thearling.com/text/dmwhite/dmwhite.htm
https://www.scribd.com/document/285155752/Steps-of-the-KnowledgeDiscovery-in-Databases-Process-Www-sqldatamining
https://www.scribd.com/document/55166998/Seminar-Data-Mining
http://academic.csuohio.edu/fuy/Pub/pot97.pdf
https://www.scribd.com/document/210762858/Data-Mining-Application-inConstruction-Project
Glossary
analytical model
A structure and process for analyzing a dataset. For example, a decision

tree is a model for the classification of a dataset.
anomalous data
Data that result from errors (for example, data entry keying errors) or
that represent unusual events. Anomalous data should be examined
carefully because it may carry important information.
Artificial neural
networks
Non-linear predictive models that learn through training and resemble

biological neural networks in structure.
CART
Classification and Regression Trees. A decision tree technique used for

classification of a dataset. Provides a set of rules that you can apply to
a new (unclassified) dataset to predict which records will have a given
outcome. Segments a dataset by creating 2-way splits. Requires less

data preparation than CHAID.
CHAID
Chi Square Automatic Interaction Detection. A decision tree technique

used for classification of a dataset. Provides a set of rules that you can
apply to a new (unclassified) dataset to predict which records will have
a given outcome. Segments a dataset by using chi square tests to
create multi-way splits. Preceded, and requires more data preparation
than, CART.
data cleansing
The process of ensuring that all values in a dataset are consistent and
correctly recorded.
data mining
The extraction of hidden predictive information from large databases.
data navigation
The process of viewing different dimensions, slices, and levels of detail

of a multidimensional database. See OLAP.
data
visualization
The visual interpretation of complex relationships in multidimensional

data.
data warehouse
A system for storing and delivering massive quantities of data.
decision tree
A tree-shaped structure that represents a set of decisions. These

decisions generate rules for the classification of a dataset. See CART
and CHAID.
dimension
In a flat or relational database, each field in a record represents a

dimension. In a multidimensional database, a dimension is a set of
similar entities; for example, a multidimensional sales database might
include the dimensions Product, Time, and City.
linear model
An analytical model that assumes linear relationships in the coefficients

of the variables being studied.
linear regression
A statistical technique used to find the best-fitting linear relationship

between a target (dependent) variable and its predictors (independent
variables).
logistic
regression
A linear regression that predicts the proportions of a categorical target

variable, such as type of customer, in a population.
multidimensiona
l database
A database designed for on-line analytical processing. Structured as a

multidimensional hypercube with one axis per dimension.
non-linear
model
An analytical model that does not assume linear relationships in the

coefficients of the variables being studied.
parallel
processing
The coordinated use of multiple processors to perform computational

tasks. Parallel processing can occur on a multiprocessor computer or on
a network of workstations or PCs.
predictive model
A structure and process for predicting the values of specified variables

in a dataset.
prospective data
analysis
Data analysis that predicts future trends, behaviors, or events based on

historical data.
RAID
Redundant Array of Inexpensive Disks. A technology for the efficient

parallel storage of data for high-performance computer systems.
time
analysis
series
The analysis of a sequence of measurements made at specified time

intervals. Time is usually the dominating dimension of the data.

Operational Research-G3 r2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Operational Research-G3 r2

Hochgeladen von

Copyright:

Verfügbare Formate

Data Mining

1.1 What is data mining

1.2.1 Types of relationship

1.2.1 Major elements 6

1.3 Data mining process

1.4 Data mining issues 2

1.1 What is data mining?

Classes: Stored data is used to locate data in predetermined groups. For

Clusters: Data items are grouped according to logical relationships or

Associations: Data can be mined to identify associations. The beer-diaper

Sequential patterns: Data is mined to anticipate behavior patterns and

Data mining consists of five major elements:

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through

Genetic algorithms: Optimization techniques that use processes such as

Decision trees: Tree-shaped structures that represent sets of decisions.

dataset to predict which records will have a given outcome. CART

Nearest neighbor method: A technique that classifies each record in a

Data visualization: The visual interpretation of complex relationships in

1.3 Data mining process

Figure 1: The Data Mining Process

1.4 Data Mining Issues

Figure 2: Proposed model of an information flow

Data mining techniques can be used effectively on data stored in Building

Other benefits include constructing predictive plans based on correlations

relationshipsregarding serious occupational accidents in the industry. The results

Dealing with construction cost overruns using data mining by Dominic D

Figure 3: Validation results (Standard models vs Bootstrapping)

2.1.4 Application in asset management

consumption could be compared with benchmark figures to see if operators on

A structure and process for analyzing a dataset. For example, a decision

Non-linear predictive models that learn through training and resemble

Classification and Regression Trees. A decision tree technique used for

outcome. Segments a dataset by creating 2-way splits. Requires less

Chi Square Automatic Interaction Detection. A decision tree technique

The extraction of hidden predictive information from large databases.

The process of viewing different dimensions, slices, and levels of detail

The visual interpretation of complex relationships in multidimensional

A system for storing and delivering massive quantities of data.

A tree-shaped structure that represents a set of decisions. These

In a flat or relational database, each field in a record represents a

An analytical model that assumes linear relationships in the coefficients

A statistical technique used to find the best-fitting linear relationship

A linear regression that predicts the proportions of a categorical target

A database designed for on-line analytical processing. Structured as a

An analytical model that does not assume linear relationships in the

The coordinated use of multiple processors to perform computational

A structure and process for predicting the values of specified variables

Data analysis that predicts future trends, behaviors, or events based on

Redundant Array of Inexpensive Disks. A technology for the efficient

The analysis of a sequence of measurements made at specified time

Das könnte Ihnen auch gefallen