Sie sind auf Seite 1von 14

Data Mining

&
Constructio
n Industry
in India
Quantitative and
Operations
Research
Group 3
Arpit| Divya| Mushfique|
Nirmal| Shilpa

TABLE OF CONTENTS
1. Data mining

1.1 What is data mining


1.2 How does it work

2
2

1.2.1 Types of relationship

1.2.1 Major elements 6


1.2.1 Levels of analysis

1.3 Data mining process

1.4 Data mining issues 2


2. Construction Industry 1
2.1 Application of data mining in construction industry 2
2.1.1 In Building Life Cycle Modelling and BMS 6
2.1.2 To minimize occupational injuries in construction industry
2.1.3 In cost management 6
2.1.4 In asset management 6
3. Conclusion

4. References

DATA MINING AND CONSTRUCTION INDUSTRY IN INDIA. GIVE EXAMPLES FOR DATA
TYPOLOGIES AND ANALYTICS FOR INDUSTRY NEEDS

1.1 What is data mining?


Advances in data generation and collection are producing data sets of massive
size in commerce and also in various scientific disciplines. The ease with which
data can now be gathered and stored has created a new attitude towards
data analysis: gather whatever data you can, whenever and wherever possible.
It has become an article of faith that the gathered data will have value, either
for the purpose that initially motivated its collection or for purpose not yet
envisioned.
Data mining means using exploration, analysis and discovering meaningful
patterns and rules from large amount of data. Data mining is generally used for
prediction and description.
1.2 How does data mining work?
While

large-scale

information

technology

has

been

evolving

separate

transaction and analytical systems, data mining provides the link between the
two. Data mining software analyzes the relationships and patterns in stored
transaction data, based on open-ended user queries. Several types of analytical
software are available: statistical, machine learning, and neural networks.
Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For


example, a restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order. This
information could be used to increase traffic by having daily specials.

Clusters: Data items are grouped according to logical relationships or


consumer preferences. For example, data can be mined to identify
market segments or consumer affinities.

Associations: Data can be mined to identify associations. The beer-diaper


example is an example of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and


trends. For example, an outdoor equipment retailer could predict the
likelihood of a backpack being purchased based on a consumer's
purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse
system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology


professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through


training and resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as


genetic combination, mutation, and natural selection in a design based
on the concepts of natural evolution.

Decision trees: Tree-shaped structures that represent sets of decisions.


These decisions generate rules for the classification of a dataset. Specific
decision tree methods include Classification and Regression Trees (CART)
and Chi Square Automatic Interaction Detection (CHAID) . CART and
CHAID are decision tree techniques used for classification of a dataset.
They provide a set of rules that you can apply to a new (unclassified)

dataset to predict which records will have a given outcome. CART


segments a dataset by creating 2-way splits while CHAID segments using
chi square tests to create multi-way splits. CART typically requires less data
preparation than CHAID.

Nearest neighbor method: A technique that classifies each record in a


dataset based on a combination of the classes of the k record(s) most
similar to it in a historical dataset (where k 1). Sometimes called the knearest neighbor technique.

Rule induction: The extraction of useful if-then rules from data based on
statistical significance.

Data visualization: The visual interpretation of complex relationships in


multidimensional data. Graphics tools are used to illustrate data
relationships.

1.3 Data mining process


Figure illustrates the phases, and the iterative nature, of a data mining project.
The process flow shows that a data mining project does not stop when a
particular solution is deployed. The results of data mining trigger new business
questions, which in turn can be used to develop more focused models.

Figure 1: The Data Mining Process

1.4 Data Mining Issues


As data mining initiatives continue to evolve, there are several issues that
include, but are not limited to, data quality, interoperability, mission creep, and
privacy.

2. Construction Industry
The construction industry is responsible for undertaking some of the biggest and
most expensive projects on Earth. Its an industry where 35% of costs are
accounted for by material waste and remedial work. So efficient resource
management could be the difference between delivering on budget and
bankrupting an organization (or several organizations) in the industry.
Huge amounts of resources and work go into major construction projects and of
course this means that huge volumes of data are generated. This data can
come from people, computers, machines, sensors, and any other datagenerating device or agent. That, naturally enough, is what makes it big.It is also
constantly increasing with additional input from sources as diverse as on-site
workers, cranes, earth movers, material supply chains, and even buildings
themselves.
2.1 Application of data mining in construction industry
Traditional information systems are good at recording basic information about
project schedules, CAD designs, costs, invoices, and employee details. However,
they are limited in their ability to work with unstructured data like free text,
printed information or analog sensor readings. Often, they can only handle
orderly digital rows and columns of numbers.
The idea of harnessing big data is to gain more insights and make better
decisions in construction management by not only accessing significantly more
data, but by properly analyzing it to draw practical building project conclusions.
In fact, big data, like truckloads of bricks or bags of cement, isnt useful on its
own until processed to formulate decision driving results.
2.1.1 Its application In Building Life Cycle Modelling and BMS
The rich set of building data generated or accumulated during the design and
documentation phases of buildings remains relevant even after the construction.
This data becomes richer as operations and maintenance data is included and

updated

regularly.

Architects,

Interior

designers,

engineers,

contractors,

marketing and sales personnel, building managers and owners can extract
useful information from databases for building renovation, maintenance and
operations. The Figure illustrates a proposed model of the information flow in
building design and maintenance.

Figure 2: Proposed model of an information flow

Data mining techniques can be used effectively on data stored in Building


Maintenance System (BMS) by extracting useful knowledge that can be used for
future management and design decision making. Knowledge that simply
implicitly resides in BMS databases and corresponds to the above figure includes:
1. Components that frequently need maintenance and therefore needs to
be inspected carefully
2. Historical consequences of maintenance decisions that may inform future
decisions
3. Components of buildings that significantly determine maintenance cost
and therefore may inform future building designs, as well as refurbishment
of the building.

Other benefits include constructing predictive plans based on correlations


obtained from applying data mining techniques on the maintenance data sets
of buildings. For example, the role of potential correlations between seasons and
malfunction rates in guiding the allocation of maintenance resources.
2.1.2 Application to minimize occupational injuries in construction industry
Utilizing a database of accidentcases a potential cause and effect relationship
regarding serious occupational accidents in the industry can be established,
which could help forming a frameworkfor improving the safety practices and
training programs that are essential to protecting constructionworkers from
occasional or unexpected accidents.
A research by Ching-Wu Cheng, Sou-Sen Leu, Ying-Mei Cheng, Tsung-Chih Wu,
Chen-Chung Lin on the application of data mining techniques to explore factors
contributing to occupational injuries in Taiwans construction industry by using
such a database using the datamining method known as classification and
regression tree (CART). Utilizing a database of 1542 accidentcases during the
period

20002009,

the

study

established

potential

cause-and-effect

relationshipsregarding serious occupational accidents in the industry. The results


of this study show that the occurrencerules for falls and collapses in both public
and private project construction industries serve as keyfactors to predict the
occurrence of occupational injuries.
2.1.3 Applications in cost management
One of the main aims of any construction client is to procure a project within the
limits of a predefined budget. However, most construction projects routinely
overrun their cost estimates. Existing theories on construction cost overrun
suggest a number of causes ranging from technical difficulties, optimism bias,
managerial incompetence and strategic misrepresentation. However, much of
the budgetary decision-making process in the early stages of a project is carried
out in an environment of high uncertainty with little available information for
accurate estimation.

Dealing with construction cost overruns using data mining by Dominic D


Ahiaga-Dagbui and Simon D Smith Uses non-parametric bootstrapping and
ensemble modelling in artificial neural networks, developingfinal project costforecasting models with 1600 completed projects in this experimental
research.This helped to extract information embedded in data on completed
construction projects, in an attempt to address the problem of dearth of
information in the early stages of a project. 92% of the 100 validation predictions
were within 10% of the actual final cost of the project whiles 77% were within
5% of actual final cost.

Figure 3: Validation results (Standard models vs Bootstrapping)

2.1.4 Application in asset management


While ideas and imagination are powerful resources, a real-life application of big
data in commercial construction can also be inspiring. One such example
comes from Nick Savko & Sons, Inc., a construction company based in Ohio,
and offering earthmoving and road surfacing services. The company equipped
its machines, including a scraper and articulated dump trucks, with 36 global
locator devices, so that the machines could be monitored at a distance. The
initial installation of the first two devices was done for them by a dealer, and the
company then installed the other 34 devices by themselves.

The devices gathered information on machine cycle time, idle time, productivity,
and more. This information was then fed into an asset management software
program. Idle time and location analysis allowed managers to know if too many
or too few trucks were being used, or if an earthmover would be more gainfully
used elsewhere. The same information was also analyzed to generate
information

on

loads

carried,

cycle

times,

and

cycle distances.

Fuel

consumption could be compared with benchmark figures to see if operators on


site were using the machines efficiently, or if there were possible mechanical
problems to be fixed.
The big benefits to the company included increasing productivity enough to
finish the project a month ahead of schedule, and fixing potential problems
before they became real ones. Because the data gathered also showed the
company its real costs, it now uses this information to tune profitability and
become even more competitive for following projects.
3. Conclusion
By using data mining in construction project, it will improve the performance and
help the contractors to transform their large data sets into useful information for
business improvement. Data mining play an important role as to combine the
knowledge and existing information to make its forecasts of final output.
One of the major challenges of data mining is to identify a poor culture of data
warehousing in the construction industry. For instance, data mining requires large
data sets to transform into useful information. However, for most construction
firms, there is unavailable or low amount of useful and complete data to model
the construction processes. It is a limitation and potential pitfall must always be
clearly communicated to the end user and find ways to solve this problem.

References

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac
e/datamining.htm
https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#DM
CON046
http://www.thearling.com/text/dmwhite/dmwhite.htm
https://www.scribd.com/document/285155752/Steps-of-the-KnowledgeDiscovery-in-Databases-Process-Www-sqldatamining
https://www.scribd.com/document/55166998/Seminar-Data-Mining
http://academic.csuohio.edu/fuy/Pub/pot97.pdf
https://www.scribd.com/document/210762858/Data-Mining-Application-inConstruction-Project

Glossary
analytical model

A structure and process for analyzing a dataset. For example, a decision


tree is a model for the classification of a dataset.

anomalous data

Data that result from errors (for example, data entry keying errors) or
that represent unusual events. Anomalous data should be examined
carefully because it may carry important information.

Artificial neural
networks

Non-linear predictive models that learn through training and resemble


biological neural networks in structure.

CART

Classification and Regression Trees. A decision tree technique used for


classification of a dataset. Provides a set of rules that you can apply to
a new (unclassified) dataset to predict which records will have a given

outcome. Segments a dataset by creating 2-way splits. Requires less


data preparation than CHAID.
CHAID

Chi Square Automatic Interaction Detection. A decision tree technique


used for classification of a dataset. Provides a set of rules that you can
apply to a new (unclassified) dataset to predict which records will have
a given outcome. Segments a dataset by using chi square tests to
create multi-way splits. Preceded, and requires more data preparation
than, CART.

data cleansing

The process of ensuring that all values in a dataset are consistent and
correctly recorded.

data mining

The extraction of hidden predictive information from large databases.

data navigation

The process of viewing different dimensions, slices, and levels of detail


of a multidimensional database. See OLAP.

data
visualization

The visual interpretation of complex relationships in multidimensional


data.

data warehouse

A system for storing and delivering massive quantities of data.

decision tree

A tree-shaped structure that represents a set of decisions. These


decisions generate rules for the classification of a dataset. See CART
and CHAID.

dimension

In a flat or relational database, each field in a record represents a


dimension. In a multidimensional database, a dimension is a set of
similar entities; for example, a multidimensional sales database might
include the dimensions Product, Time, and City.

linear model

An analytical model that assumes linear relationships in the coefficients


of the variables being studied.

linear regression

A statistical technique used to find the best-fitting linear relationship


between a target (dependent) variable and its predictors (independent
variables).

logistic
regression

A linear regression that predicts the proportions of a categorical target


variable, such as type of customer, in a population.

multidimensiona
l database

A database designed for on-line analytical processing. Structured as a


multidimensional hypercube with one axis per dimension.

non-linear
model

An analytical model that does not assume linear relationships in the


coefficients of the variables being studied.

parallel
processing

The coordinated use of multiple processors to perform computational


tasks. Parallel processing can occur on a multiprocessor computer or on
a network of workstations or PCs.

predictive model

A structure and process for predicting the values of specified variables


in a dataset.

prospective data
analysis

Data analysis that predicts future trends, behaviors, or events based on


historical data.

RAID

Redundant Array of Inexpensive Disks. A technology for the efficient


parallel storage of data for high-performance computer systems.

time
analysis

series

The analysis of a sequence of measurements made at specified time


intervals. Time is usually the dominating dimension of the data.

Das könnte Ihnen auch gefallen