Sie sind auf Seite 1von 10





Data mining consists of observable space time earthquake

evolving set of techniques that can be patterns from unobservable dynamics
used to extract valuable information and using data mining techniques, pattern
knowledge from massive volumes of recognition and ensemble forecasting.
data. Data mining research &tools have Thus this paper gives insight on how data
focused on commercial sector mining can be applied in finding the
applications. Only a fewer data mining consequences of earthquakes and hence
research have focused on scientific data. alerting the public.
This paper aims at further data mining
study on scientific data. This paper
highlights the data mining techniques
applied to mine for surface changes over
The field of data mining has evolved
time (e.g. Earthquake rupture). The data
from its roots in databases, statistics, artificial
mining techniques help researchers to
intelligence, information theory and
predict the changes in the intensity of
algorithms into a core set of techniques that
volcanos. This paper uses predictive
have been applied to a range of problems.
statistical models that can be applied to
Computational simulation and data
areas such as seismic activity , the
acquisition in scientific and engineering
spreading of fire. The basic problem in
domains have made tremendous progress
this class of systems is unobservable
over the past two decades. A mix of advanced
dynamics with respect to earthquakes.
algorithms, exponentially increasing
The space-time patterns associated with
computing power and accurate sensing and
time, location and magnitude of the
measurement devices have resulted in more
sudden events from the force threshold
data repositories.
are observable. This paper highlights the


Advanced technologies in networks DATA MINING-DEFINITIONS

have enabled the communication of large • Data mining is defined as
volumes of data across the world. This results process of extraction of relavent data
in a need of tools &Technologies for and hidden facts contained in
effectively analyzing the scientific data sets databases and data warehouses.
with the objective of interpreting the • It refers to find out the new
underlying physical phenomena. Data mining knowledge about an application
applications in geology and geophysics have domain using data on the domain
achieved significant success in the areas as usually stored in the databases. The
weather prediction, mineral prospecting, application domain may be
ecology, modeling etc and finally predicting astrophysics, earth science or about
the earthquakes from satellite maps. solar system.
An interesting aspect of many of these Datamining techniques support to
applications is that they combine both spatial identify nuggets of information and
and temporal aspects in the data and in the extracting this information in such a
phenomena that is being mined. Data sets in way that ,this will support in decision
these applications comes from both making, prediction, forecasting and
observations and simulation. Investigations estimation.
on earthquake predictions are based on the
assumption that all of the regional factors can
• Bring together representatives of the data
be filtered out and general information about
mining community and the domain
the earthquake precursory patterns can be
science community so that they can
understand the current capabilities and
Feature extraction involves a pre
research objectives of each other
selection process of various statistical
communities related to data mining.
properties of data and generation of a set of
• Identify a set of research objectives from
seismic parameters, which correspond to
the domain science community that would
linearly independent coordinator in the
be facilitated by current or anticipated
feature space. The seismic parameters in the
data mining techniques.
form of time series can be analyzed by using
• Identify a set of research objectives for
various pattern recognition techniques.
the data mining community that could
Statistical or pattern
support the research objectives of the
recognition methodology usually performs
domain science community.
this extraction process. Thus this paper gives
insight of mining the scientific data.


Data mining is used to find patterns previously unseen events within a

and relationships in data patterns.The large complex database.
relationships in data patterns can be analyzed • Unknown events/unknown
via 2 types of models. algorithms: Use thresholds or
1. Descriptive models: Used to describe trends to identify transient or
patterns and to create meaningful otherwise unique events and
subgroups or clusters. therefore to discover new physical
2. Predictive models: Used to forecast phenomena.
explicit values, based upon patterns in ** This paper focuses on unknown
known results. **This paper focuses on events and known algorithms.
predictive models. 2. Relationship based mining:
In large databases data mining and • Spatial Associations: Identify
knowledge discovery comes in two flavors: events (e.g. astronomical objects)
1. Event based mining: at the same location. (e.g. same
• Known events/known region of the sky)
algorithms: Use existing physical • Temporal Associations:
models (descriptive models and Identify events occurring during
algorithms) to locate known the same or related periods of
phenomena of interest either time.
spatially or temporally within a • Coincidence Associations: Use
large database. clustering techniques to identify
• Known events/unknown events that are co-located within a
algorithms: Use pattern multi-dimensional parameter
recognition and clustering space.
properties of data to discover new ** This paper focuses on all
observational (physical) relationship-based mining.
relationships (algorithms) among User requirements for data
known phenomena. mining in large scientific
• Unknown events/known
algorithms: Use expected physical
• Cross identifications: Refers to
relationships (predictive models,
the classical problem of
Algorithms) among observational
associating the source list in one
parameters of physical phenomena
database to the source list in
to predict the presence of


• Cross correlation: Refers to the ♦ EDA Exploratory data analysis

search for correlations, tendencies, e.g. frequency counts
and trends between physical histograms.
parameters in multidimensional ♦ Attribute redefinition e.g.
data usually across databases. bodies mass index.
• Nearest neighbor ♦ Data analysis is a measure of
identification. Refers to the association and their
general application of clustering relationships between
algorithms in multidimensional attributes interestingness of
parameter space usually within a rules, classification ,prediction
database. etc.
• Systematic data exploration: 2. Visualization:
Refers to the application of broad ♦ Enhances EDA , make patterns
range of event based queries and visible in different views .
relationship based queries to a 3. Clustering(cluster analysis):
database in making a Clustering is a process of
serendipitous discovery of new grouping similar data. The data which
objects or a new class . is are not part of clustering are called
** This paper focuses on as outliers. How to cluster in different
correlation and Clustering. conditions,
DATA MINING TECHNIQUES: ♦ Class label is unknown: Group
The various data mining techniques are related data to form new classes,
1. Statistics e.g., cluster houses to find
2. Clustering distribution patterns
3. Visualization ♦ Clustering based on the
4. Association principle: maximizing the intra-
5. Classification & Prediction class similarity and minimizing
6. Outlier analysis the interclass similarity
7. Trend and evolution analysis ♦ It provides subgroups of
1. Statistics: population for further analysis or
♦ Data cleansing i.e. the removal action –very important when
of erroneous or irrelevant data dealing with large databases.
known as outliers. 4. Association (correlation and causality)


Mining association rules finds the (ii) Chemical changes in Ground water
interesting correlation relationship among (iii) Radon Gas in Ground water wells.
large databases . Ground Water Levels:-
5. Classification and Prediction Changing water levels in deep wells
♦ Finding models (functions) are recognized as precursor to
that describe and distinguish earthquakes. The pre-seismic
classes or concepts for future variations at observation wells are as
prediction e.g., classify countries follows.
based on climate, or classify cars 1. A gradual lowering of water levels
based on gas mileage at a period of months or years.
♦ Presentation: decision-tree, 2. An accelerated lowering of water
classification rule, neural network levels in the last few months or

♦ Prediction: Predict some weeks preceeding the earthquake.

unknown or missing numerical 3. A rebound, where water levels

values begin to increase rapidly in the last

6. Outlier analysis few days or hours before the main

♦ Outlier: A data object that is shock.

irrelavent to general behavior of Chemical Changes in Ground water

the data ,it can be considered as an 1. The Chemical composition of

exception but is quite useful in ground water is affected by

fraud detection in rare events seismic events.

analysis 2. Researchers at the university of

7. Trend and evolution analysis Tokyo tested the water after the

♦ Trend and deviation: earthquake occured, the result of

regression analysis the study showed that the

composition of water changed
♦ Sequential pattern mining,
significantly in the period around
periodicity analysis
earthquake area.
♦ Similarity-based analysis
3. They observed that the chloride
** This paper focuses on clustering
concentration is almost constant.
and visualization technique for
4. Levels of sulphate also showed a
predicting the
similar rise.
Radon Gas in Ground water wells.
(i) Ground water levels


1. An increase level of radon gas in

wells is a precursor of earthquakes
recognized by research group.
♦ Although radon has relatively a short
half life and is unlikely to seep the
surface through rocks from the depths at
which seismic is very soluble in water
and can routinely be monitored in wells

and springs often radon levels at such

This proposes a multi-resolutional
springs show reaction to seismic events
approach, which combines local clustering
and they are monitored for earthquake
techniques in the data space with a non-
hierarchical clustering in the feature space.
♦ There is no effective solution to the The raw data are represented by n-
problem. dimensional vector Xi of measurements Xk.
♦ To solve this problem earthquake The data space can be searched for patterns
catalogs, geo-monitoring time series data and can be visualized by using local or
about stationary seismo-tectonic remote pattern recognition and by advanced
properties of geological environment and visualization capabilities. The data space X is
expert knowledge and hypotheses transformed to a new abstract space Y of
♦ To solve this problem earthquake vectors Yj . The coordinates Yl of these
catalogs, geo-monitoring time series data vectors represent nonlinear functions of
about stationary seismo-tectonic measurements Xk, which are averaged in
properties of geological environment and space and time in given space-time windows.
expert knowledge and hypotheses about This transformation allows for coarse
earthquake precursors . graining of data (data quantization),


amplification of their characteristic features agglomerative schemes, such as modified

and suppression of the noise and other Mutual Nearest Neighbour algorithm (MNN).
random components. The new features Yl This type of clustering extracts the localized
form a N-dimensional feature space. We use clusters in the high resolution data space. In
multi-dimensional scaling procedures for the feature space we are searching for global
visualizing the multi-dimensional events in clusters of time events comprising similar
3D space. This transformation allows a events from the whole time interval.
visual inspection of the N-dimensional The non-hierarchical clustering
feature space. The visual analysis helps algorithms are used mainly for extracting
greatly in detecting subtle cluster structures compact clusters by using global knowledge
which are not recognized by classical about the data structure. We use improved
clustering techniques, selecting the best mean based schemes, such as a suite of
pattern detection procedure used for data moving schemes, which uses the k-means
clustering, classifying the anonymous data procedure and four strategies of its tuning by
and formulating new hypothesis. moving the data vectors between clusters to
obtain a more precise location of the
minimum of the goal function:

j (ω , n) = ∑J ∑ | xi − z j | 2
i Cj

where zj is the position of the center

of mass of the cluster j , while xi are the
feature vectors closest to zj . To find a global
minimum of function J (), we repeat the
Clustering schemes Clustering clustering procedures at different initial
analysis is a mathematical concept whose conditions. Each new initial configuration is
main role is to extract the most similar constructed in a special way from the
separated sets of objects according to a given previous results by using the methods. The
similarity measure. This concept has been cluster structure with the lowest J (w, n)
used for many years in pattern recognition. minimum is selected.
Depending on the data structures and goals of HIERARCHICAL CLUSTERING
classification, different clustering schemes METHODS:
must be applied. A hierarchical clustering method
In our new approach we use two produces a classification in which small
different classes of clustering algorithms for clusters of very similar molecules are nested
different resolutions. In data space we use within larger clusters of less closely-related


molecules. Hierarchical agglomerative having no hierarchical relationships between

methods generate a classification in a bottom- them. A systematic evaluation of all possible
up manner, by a series of agglomerations in partitions is quite infeasible, and many
which small clusters, initially containing different heuristics have described to allow
individual molecules, are fused together to the identification of good, but possibly sub-
form progressively larger clusters. optimal, partitions. Three of the main
Hierarchical agglomerative methods are often categories of non-hierarchical method are
characterized by the shape of the clusters they single-pass, relocation and nearest neighbour.
tend to find, as exemplified by the following Single-pass method (e.g. Leader) produce
range: single-link - tends to find long, clusters that are dependent upon the order in
straggly, chained clusters; Ward and group- which the compounds are processed, and so
average - tend to find globular clusters; will not be considered further. Relocation
complete-link - tends to find extremely methods, such as k-means, assign compounds
compact clusters. Hierarchical divisive to a user-defined number of seed clusters and
methods generate a classification in a top- then iteratively reassign compounds to
down manner, by progressively sub-dividing produce the better clusters result. Such
the single cluster which represents an entire methods are prone to reaching local optimum
dataset .Monothetic (divisions based on just a rather than a global optimum, and it is
single descriptor) hierarchical divisive generally not possible to determine when or
methods are generally much faster in where the global optimum solution has been
operation than the corresponding polythetic reached. Nearest neighbour methods, such as
(divisions based on all descriptors) the Jarvis-Patrick method, assign compounds
hierarchical divisive and hierarchical to the same cluster as some number of their
agglomerative methods, but tend to give poor nearest neighbours. User-defined parameters
results. One problem with these methods is determine how many nearest neighbours need
how to choose which clusters or partitions to to be considered, and the necessary level of
extract from the hierarchy because display of similarity between nearest neighbour lists.
the complete hierarchy is not really Other non-hierarchical methods are generally
appropriate for data sets of more than a few inappropriate for use on large, high-
hundred compounds. dimensional datasets such as those used in
NON-HIERARCHICAL CLUSTERING chemical applications.
A non-hierarchical method generates a ♦ In Scientific discovery – super
classification by partitioning a dataset, giving conductivity research, For Knowledge
a set of (generally) non-overlapping groups Acquisition.


♦ In Medicine – drug side effects, hospital CONCLUSION:

cost analysis, genetic sequence analysis, The problem of earthquake
prediction etc. prediction is based on data extraction
♦ In Engineering – automotive diagnostics of pre-cursory phenomena and it is
expert systems, fault detection etc., highly challenging task various
♦ In Finance – stock market perdition, computational methods and tools are
credit assessment, fraud detection etc. used for detection of pre-cursor by
FUTURE ENHANCEMENTS extracting general information from

The future of data mining lies in predictive noisy data.

analytics. The technology innovations in data By using common frame work of

mining since 2000 have been truly Darwinian clustering we are able to perform multi-

and show promise of consolidating and resolutional analysis of seismic data starting

stabilizing around predictive analytics. from the raw data events described by their

Nevertheless, the emerging market for magnitude spatio-temporal data space. This

predictive analytics has been sustained by new methodology can be also used for the

professional services, service bureaus and analysis of the data from the geological

profitable applications in verticals such as phenomena e.g. We can apply this clustering

retail, consumer finance, telecommunications, method to volcanic eruptions.

travel and leisure, and related analytic REFERENCES:

applications. Predictive analytics have Books:
successfully proliferated into applications to
1. W.Dzwinel et
support customer recommendations, customer
al Non multidimensional scaling and
value and churn management, campaign
visualization of earth quake cluster over
optimization, and fraud detection. On the
space and feature space, nonlinear
product side, success stories in demand
processes in geophysics 12[2005] pp1-12.
planning, just in time inventory and market
2. C.Lomnitz.
basket optimization are a staple of predictive
Fundamentals of Earthquake prediction
analytics. Predictive analytics should be used
to get to know the customer, segment and
3. B.Gutenberg &
predict customer behavior and forecast
C.H. Richtro, Earthquake magnitude,
product demand and related market
intensity, energy & acceleration bulseism
dynamics.Finally, they are at different stages
soc. Am 36, 105-145 [1996]
of growth in the life cycle of technology
4. C.Brunk,
J.Kelly & Rkohai “Mineset An integrate


system for data access, Visual Data

Mining & Analytical Data Mining”,
proceeding of the 3rd conference on KDD
5. Andenberg
M.R.Cluster Analysis for application,
New York, Acedamic, Press 1973.