Gokaraju Rangaraju Institute of Engineering and Technology

Departmet of Computer Science and Engineering Gokaraju Rangaraju Institute of Engineering and Technology
Certificate
This is to certify that the thesis entitled Analysis of density-based clustering algorithms by Akifunnisa(08241A0502),Apoorva P(08241A0503), G.Ramya Teja(08241A0530) and B.Surya Kanthi(08241A0547) submitted in partial fulfillment of the requirements for the degree of Bachelor of Technology in Computer Science and Engineering of the Jawaharlal Nehru Technology University, Hyderabad, during the academic year 2010-11, is a bonfide record of work carried out under our guidance and supervision. The results embodied in this report have not been submitted to any other University or Institution for the award of any degree or diploma.
(Guide) Prof.Beena Bethel
(External Examiner)
(Head of Department) Dr.K.Anuradha
ACKNOWLEDGEMENT
I thank Prof Beena Bethel, Associate professor, Computer Science & Engineering GRIET providing seamless support and knowledge over the past one year, and also for providing right suggestions at every phase of the development of our project.
I have immense pleasure in expressing my sincere thanks to Dr. K Anuradha, Head of Computer Science and Engineering Department, who inspired us in our work.
I express a whole hearted gratitude to Prof. P.S.Raju, Director, GRIET and Dr. Jandhyala, N.Murthy, Principal, GRIET for providing us the conducive environment for carrying through our academic schedules and projects with ease.
There is definitely a need to thank my family members and friends without whose support project would have been deferred.
CHAPTER 1
INTRODUCTION OF DATA MINING AND TECHNIQUES

1.1
What is Data Mining ?
Data mining refers to extracting or mining knowledge from large amounts of data mining is a vivid term characterizing the process that finds a small set of precious nuggets from a great deal of raw material. Data Mining is a technique for extracting interesting and useful knowledge from the huge data sources. Data mining is the use of automated data analysis techniques to uncover previously undetected relationships among data items. Data mining often involves the analysis of data stored in a data warehouse. Three of the major data mining techniques are regressions, classifications and clustering. 1.2 OVERVIEW OF DATA MINING Data mining is also called knowledge discovering refers to extracting interesting and useful knowledge from huge amounts of databases. Data mining in the databases is a new interdisciplinary field, merging ideas from statistics, machine learning, databases and parallel computing. Knowledge Discovering Databases (KDD) is a process consists of finding useful information and patterns in data. Where as Data mining is the use of algorithms to extract the information and patterns derived by the KDD process. The KDD process consists of the following five steps. Selection: The data needed for the data mining process may be obtained from many different and heterogeneous data sources. Pre-processing: Erroneous data may be corrected or removed, whereas missing Data must be supplied or predicted. Transformation: Data from different sources must be converted into a common format for processing. Data mining: Based on the data mining task being performed, this step applies algorithms to the transformed data to generate the desired results.
Interpretation/Evaluation: How the data mining results are presented to the users is extremely important because the usefulness of the results is dependent on it. Various visualization and GUI strategies are used at this last step. The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user or may be stored s new knowledge in the knowledge base. Thus, data mining is an only single step in the entire processes of knowledge discovery in database i.e., an essential one since it uncovers hidden patterns for evaluation. The architecture of a typical data mining system may have the following major components. Database, data warehouse, Worldwide Web, or other information repository: This is one or a set of databases, data warehouses, spreadsheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data. Database or data warehouse server: The database or data warehouse server is responsible for fetching the relevant data, based on the users data mining request. Knowledge base: This is the domain knowledge that is used to guide the search or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. Pattern Evaluation Module: This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search toward interesting patterns. Data Mining Engine: This is the data mining system and ideally consists of a set of functional modules for tasks such as characterization, association and correlation and correlation analysis, classification, prediction, cluster analysis, outlier analysis, and evolution analysis. Graphics User Interface: This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing explanatory data mining based on the intermediate data mining results. Data mining involves an integration of techniques from multiple disciplines such as databases technology, statistics, machine interval, image and signal processing spatial data analysis.
By performing data mining, interesting, regularities, or high level information can be extracted from databases and viewed or browsed from different angles. The discovered knowledge can be applied to decision-making, process control, information management, and query processing. Therefore, data mining is considered one of the most important frontiers in databases and information systems and one of the most promising interdisciplinary developments in the information technology. 1.3 Data Mining- On what kind of data? In principle data mining should be applicable to any kind of information repository. This includes relational databases, data warehouses, transaction databases advanced databases systems, flat files and the World Wide Web data. We have a number of different data repositories on which mining can be performed. Data mining should be applicable to any kind of data repository and transient data such as data streams. Data repositories will include relational databases, data warehouses, transactional databases , advanced database systems, flat files, data streams, and the World Wide Web. Relational Databases: A relational databases is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table represents an object identified by a unique key and described by a set of attributes values. Data Warehouses: Data Warehouses is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. The data are stored to provide information from historical respective and are typically summarized. Transactional Databases: A transactional databases consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number (Trans ID) and a list of the items making up the transaction. Advanced Databases Systems: Relational database systems have been widely used in business applications. With the progress of database technology, various kinds of advanced data and information systems have emerged. The new database
applications include handling spatial data (such as maps), engineering design data (such as the design of buildings, system components, or integrated circuits), hypertext and multimedia data (including text, image, video, and audio data), timerelated data (such as historical records or stock exchange data), stream data (such as video surveillance and sensors data, where data flow in and out like streams), and the Worldwide Web. In response to these needs, advanced database systems and specific application-oriented database systems have been developed. These include object-relational database systems, temporal and time-series database systems, spatial and spatiotemporal databases systems, text and multimedia databases systems, heterogeneous and legacy database systems, data stream management systems, and Web-based global information systems. Object-Relational Databases: Object-relational databases are constructed based on an object relational data model. This model extends the relational model by providing a rich data type for handling complex objects and object orientation. Because most sophisticated database applications need to handle complex objects and structures, object-relational databases are becoming increasingly popular in industry and applications. 1.4 Data Mining Functionalities - What kind of patterns can be mined? Data mining functionalities are used to specify the kind of patterns to be found in data mining task. In general, data mining tasks can be classified into two categories: descriptive and predictive mining tasks characterize the general properties of the data in the database. Descriptive mining tasks perform inference on the current data in order to make predictions. Characterization and Discrimination Data can be associated with classes or concepts. It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions. These descriptions can be derived via (1) data characterization, by summarizing the data of the class under study (often called the target class) in general terms, or (2) data discrimination, by comparison of the target class with one or a set of comparative classes(often called the contrasting classes),or (3) both data characterization and discrimination. Association Analysis Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently in a given set of data. It is widely
used for market basket or transaction data analysis. More formally, association rules are of the form X=>Y, i.e. A1^.Am_B1^.Bn where Ai (for I belongs to {1m}) and Bj (for j belongs to {I,.,n}) are attribute value pairs. The association rules X=>Y is interpreted as database tuples that satisfy the condition in X are likely to satisfy the conditions in Y. Classification and Prediction Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data(i.e., data objects whose class label is known). Classification can be used for predicting the class label of data objects. However, in many applications, users may wish to predict some missing or unavailable data values rather than class labels. This is usually the case when predicting values are numerical data and often specially referred to as prediction. Although, prediction may refer to as both value data prediction and thus is distinct from classification. Cluster Analysis Clustering analyzes data objects without consulting a known class label. In general, the class labels are not present in the training data simply because they are not known to begin with. Clustering can be used to generate such labels. The objects are clustered or grouped based on the principle of maximizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Outlier Analysis A database may contain data objects that do not comply with the general behaviour or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. However, in some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. The analysis of outlier data is referred to as outlier mining.
Evolution Analysis
Data evolution analysis describes and models regularities or trends for objects whose behaviour changes over time. Although this may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time related data, distinct features of such an analysis include time-series data analysis, sequence or periodicity pattern matching , and similarity-based data analysis. Users may have no idea about what kinds of patterns in their data may be interesting, and hence may like to search for several different kinds of patterns . Thus it is important to have a data mining system that can mine multiple kinds of patterns to accommodate different user expectations or applications. Data mining systems should be able to discover patterns at various granularity and provide a certainty measure or trustworthiness for each discovered pattern.
CHAPTER 2 CLUSTERING 2.1 What is Clustering? Grouping of objects into different sets, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait often proximity according to some defined distance measure. Clustering is a process of grouping objects with similar properties. Any cluster should exhibit two main properties; low inter-class similarity and high intra-class similarity. Clustering is an unsupervised learning. There are no predefined class label exists for the data points. Clustering analysis is used in a number of applications such as data analysis, image processing, market analysis etc. Clustering helps in gaining overall distribution of patterns and correlation among data objects. Good Clustering: A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity. The quality of a clusters result depends on both the similarity measure used by the method and its implementation.
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns. 2.2 Overview of Clustering Clustering is a division of data into groups of similar objects. Clustering algorithm can be divided into the following categories: a) Hierarchical clustering algorithm b) Partition clustering algorithm c) Spectral clustering algorithm d) Grid based clustering algorithm e) Density based clustering algorithm
2.2.1 Hierarchical and Non-Hierarchical Clustering There are two main types of clustering techniques, those that create a hierarchy of clusters and those that do not. That hierarchical clustering techniques create a hierarchical of clusters from small to big. The main reason for this is that as was already stated, clustering is an unsupervised learning technique, and as such, there is no absolutely correct answer. For this reason and depending on the particular application of the clustering, fewer or greater numbers of clusters may be desired. With a hierarchy of clusters defined it is possible to choose the number of clusters that are desired. At the extreme it is possible to have as many clusters as there are records in the database. In this case the records within the cluster are optimally similar to each other (since there is only one) and certainly different from the other clusters. But of course
such a clustering technique misses the point in the sense that the idea of clustering is to find useful patterns in the database that summarize it and make it easier to understand. Any clustering algorithm that ends up with as many
clusters as there are records has not helped the user understand the data any better. Thus one of the main points about clustering is that there be many fewer clusters than there are original records. Exactly how many clusters should be formed is a matter of interpretation. The advantage of hierarchical clustering methods is that they allow the end user to choose from either many clusters or only a few. The hierarchy of clusters is usually viewed as a tree where the smallest clusters merge together to create the next highest level of clusters and those at that level merge together to create the next highest level of clusters. Figure below shows how several clusters might form a hierarchy. When a hierarchy of clusters like this is created the user can determine what the eight number of clusters is that adequately summarizes the data while still providing useful information (at the other extreme a single cluster contain enough specific information to be useful). This hierarchy of clusters is created through the algorithm that builds the clusters. There are two main types of hierarchical clustering algorithms: Agglomerative Agglomerative clustering techniques start with as many clusters as there are records where each cluster contains just one record. The clusters that are nearest each other are merged together to form the next largest cluster. This merging is continued until a hierarchy of clusters is built with just a single cluster containing all the records at the top of the hierarchy.
Divisive-Divisive clustering techniques take the opposite approach from agglomerative techniques. These techniques start with all the records in one cluster and then try to split that cluster into smaller pieces and then in turn to try to split those smaller pieces. Of the two the agglomerative techniques are the most commonly used fro clustering and have more algorithms developed for them. Well talk about these in more detail in the next section. The non-hierarchical techniques in general are faster to create from the historical database but require that the user make some decision about the number of clusters desired or the minimum nearness required for two records to be within the same cluster. These non-hierarchical techniques often times are run multiple times starting off with some arbitrary or even random clustering and then iteratively improving the clustering by shuffling some records around. Or these techniques some times create clusters when they exist and creating new clusters when no existing cluster is a good candidate for the given record. Because the definition of which clusters are formed can depend on these initial choices of which starting clusters should be chosen or even how many clusters these techniques can be less repeatable than the hierarchical techniques and can sometimes create either too many or too few clusters because the number of clusters is predetermined by the user not determined solely by the patterns inherent in the database. Clusters at the lowest level are merged together to form larger clusters at the next level of the hierarchy.
Non-Hierarchical Clustering
There are two main non-hierarchical clustering techniques. Both of them are very fast to compute on the database but have some drawbacks. The first are the single pass methods. They derive their name from the database must only be passed through once in order to create the clusters (i.e. each record is only read from the database once). The other class of techniques are called reallocation methods. They get their name from the movement or reallocation of records from one cluster to another in order to create better clusters. The reallocation techniques do use multiple passes through the database but are relatively fast in comparison to the hierarchical techniques. Some techniques allow the user to request the number of clusters that they would like to be pulled out of the data. Predefining the number of clusters rather than having them driven by the data might seem to be a bad idea as there be some very distinct and observable clustering of the data into a certain number of clusters which the user might not be aware of.
Hierarchical Clustering Hierarchical clustering has the advantage over non-hierarchical techniques in that the clusters are defined solely by the data (not by the users predetermining the number of clusters) and that the number of clusters can be increased or decreased by simple moving up and down the hierarchy. The hierarchy is created by starting either at the top (one cluster that includes all records) and subdividing (divisive clustering) or by starting at the bottom with as many clusters as there are records and merging (agglomerative clustering). Usually the merging and subdividing are done two clusters at a time. Hierarchical clustering algorithm groups data objects to from a tree shaped structure. It can be broadly classified into agglomerative hierarchical clustering and divisive hierarchical clustering. In agglomerative approach which is also called as bottom up approach, each data points are considered to be a separate cluster and on each iteration clusters are merged based on a criteria. The merging can be done by using single link, complete link, centroid or wards method. In divisive approach all data points are considered as a single cluster and they are splitted into number of clusters based on certain criteria, and this is called as top down approach. Hierarchical Algorithms - Agglomerative (AGNES)
- Divisive (DIANA) 2.2.2 Multiple Phase Hierarchical Algorithms Multiple Phase Hierarchical Methods
1.
BIRCH (Balance Iterative Reducing and Clustering using Hierarchies): Partition objects using tree structures, then uses other algorithms to refine those clusters.
2. CURE(Cluster Using representatives): Represent cluster by fixed number of representative objects, and shrink towards centre of the cluster. 3. ROCK(Robust hierarchical Clustering using Links):Merge clusters based on interconnectivity 4.Chameleon: Dynamic modelling in hierarchical clustering. 2.3 Requirements of Clustering in Data Mining Scalability Ability to deal with different types of attributes Discovery of clusters with arbitrary shape (not just spherical clusters) Minimal requirements for domain knowledge to determine input parameters (such as # of clusters) Able to deal with noise and outliers Insensitive to order of input records Incremental clustering. High dimensionality (especially very sparse and highly skewed data) Incorporation of user-specified constraints Interpretability and usability (close to semantics) 2.4 Application of clustering Pattern Recognition Spatial Data Analysis create thematic maps in GIS by clustering feature spaces detect spatial clusters and explain them in spatial data mining
e.g., land use, city planning, earth-quake studies. Image Processing Economics Science (especially market research) WWW Document classification Cluster Weblog data to discover groups of similar access patterns. 2.5 CLUSTERING ALGORITHM: Clustering algorithms are attractive for the task of class in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a densitybased notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it.
A Density Based Notion of Clusters

When looking at the sample sets of points depicted in figure 1, can easily and unambiguously detect clusters of points and noise points not belonging to any of those clusters. we
The main reason why we recognize the clusters is that within each cluster we have a typical density of points which is considerably higher than outside of the cluster. Furthermore, the density within the areas of noise is lower than the density in any of the clusters. k-dimensional space S. Note that both, our notion of clusters and our algorithm DBSCAN, apply as well to 2D or 3D Euclidean space as to some high dimensional feature space. The key idea is that for each point of a cluster the neighborhood of a given radius has to contain at least a minimum number of points, i.e. the density in the neighborhood has to exceed some threshold. The shape of a neighborhood is determined by the choice of a distance function for two points p and q, denoted by dist(p,q). For instance, when using the Manhattan distance in 2D space, the shape of the neighborhood is rectangular. Note, that our approach works with any distance function so that an appropriate function can be chosen for some given application. For the purpose of proper visualization, all examples will be in 2D space using the Euclidean distance.
Definition 1:
(Eps-neighborhood of a point) The Epsneighborhood of a point p, Denoted by N&p), is defined by NE s(p) = {q E D I dist@,q) < Eps}. Rp naive approach could require for each point in a cluster that there are at least a minimum number (MinPts) of points in an Eps-neighborhood of that point. However, this approach fails because there are two kinds of points in a cluster, points inside of the cluster (core points) and points on the border of the cluster (border points). In general, an Epsneighborhood of a border point contains
significantly less points than an Eps-neighborhood of a core point. Therefore, we would have to set the minimum number of points to a relatively low value in order to include all points belonging to the same cluster. This value, however, will not be characteristic for the respective cluster particularly in the presence of noise. Therefore, we require that for every point p in a cluster there is a point q in C so that p is inside of the Epsneighborhood of q and N&q) contains at least MinPts points. This definition is elaborated in the following. Definition 2: (directly density-reachable) A point p is directly density-reachable from a point q wrt. Eps , MinPts if 1) P E NE&q) and 2) lNEps(q)2l MinPts (core point condition). 3) Obviously , directly density-reachable is symmetric for pairs of core points. In general, however, it is not symmetric if one core point and one border point are involved. Figure 2 shows the asymmetric case.
Figure 2: core points and border points

Definition 3: (density-reachable) A point p is densityreachable from a point q wrt. Eps and MinPts if there is a chain of points ~1, .,., p,,, p1 = q, pn = p such that pi+1 is directly density-reachable from pi. Densityreachability is a canonical extension of direct density-reachability. This relation is transitive, but it is not symmetric. Figure 3 depicts the relations of some sample points and, in particular, the asymmetric case. Although not symmetric in general, it is obvious that densityreachability is symmetric for core points. Two border points of the same cluster C are possibly not density reachable from each other because the core point condition might not hold for both of them. However, there must be a core point in C from which both border points of Care density-reachable. Therefore, we introduce the
notion of density-connectivity borderpoints.
which
covers
this
relation of
Definition 4: (density-connected) A point p is densityconnected to a point wrt. Eps and MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts. Density-connectivity is a symmetric relation. For density reachable points, the relation of density-connectivity is also reflexive (c.f. figure 3). Now, we are able to define our density-based notion of a cluster. Intuitively, a cluster is defined to be a set of density connected points which is maximal wrt. density-reachability. Noise will be defined relative to a given set of clusters. Noise is simply the set of points in D not belonging to any of its clusters. Definition 5: (cluster) Let D be a database of points. A cluster C wrt. Eps and MinPts is a non-empty subset of D satisfying the following conditions: 1) V p, q: if p E C and q is density-reachable from p wrt. Eps and MinPts, then q E C. (Maximality) 2) V p, q E C: p is density-connected to q wrt. EPS and MinPts. (Connectivity) Definition 6: (noise) Let C1 ,. . ., C, be the clusters of the database D wrt. parameters Epsilon and MinPts, i = 1, . . ., k. Then we define the noise as the set of points in the database D not belonging to any cluster Ci , i.e. noise = {p E D I V i: p $2 Cj}. Note that a cluster C wrt . Eps and MinPts contains at least MinPts points because of the following reasons. Since C contains at least one point p, p must be density-connected to itself via some point o (which may be equal to p). Thus, atleast o has to satisfy the core point condition and, consequently, the Eps-Neighborhood of o contains at least MinPts points.
4. DBSCAN: Density Based Spatial Clustering of Applications with Noise

In this section, we present the algorithm DBSCAN (Density Based Spatial Clustering of Applications with Noise) which is designed to
discover the clusters and the noise in a spatial database according to definitions 5 and 6. Ideally , we would have to know the appropriate parameters Eps and MinPts of each cluster and at least one point from the respective cluster. Then, we could retrieve all points that are density-reachable from the given point using the correct parameters. But there is no easy way to get this information in advance for all clusters of the database. However, there is a simple and effective heuristic (presented in section section 4.2) to determine the parameters Eps and MinPts of the thinnest, i.e. least dense, cluster in the database. Therefore, DBSCAN uses global values for Eps and MinPts, i.e. the same values for all clusters. The density parameters of the thinnest cluster are good candidates for these global parameter values Specifying the lowest density which is not considered to be noise.
4.1 The Algorithm

To find a cluster , DBSCAN starts with an arbitrary point p and retrieves all points density-reachable from p wrt . Eps and MinPts. If p is a core point, this procedure yields a cluster wrt. Eps and MinPts . If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database. Since we use global values for Eps and MinPts, DBSCAN may merge two clusters according to definition 5 into one cluster, if two clusters of different density are close to each other. Let the distance between two sets of points S1 and be defined as dist(S1,S$=min{dist(p,q)IpE Sr,qE S,}. Then, two sets of points having at least the density of the thinnest cluster will be separated from each other only if the distance between the two sets is larger than Eps. Consequently, a recursive call of DBSCAN may be necessary for the detected clusters with a higher value for MinPts. This is, however, no disadvantage because the recursive application of DBSCAN yields an elegant and very efficient basic algorithm. DBSCAN (SetOfPoints, Eps, MinPts) // SetOfPoints is UNCLASSIFIED ClusterId := nextId(NOISE); FOR i FROM 1 TO SetOfPoints.size DO Point := SetOfPoints.get(i); IF Point.ClId = UNCLASSIFIED THEN
IF ExpandCluster(SetOfPoints, Point, ClusterId, Eps, MinPts) THEN ClusterId := nextId(ClusterId) END IF END IF END FOR ~5~9; / 1 1nr1mwrua*m1.T SetOf Points is either the whole database or a discovered cluster from a previous run. Eps and MinPts are the global density parameters determined either manually or according to the heuristics presented in section 4.2. The function SetOf Points. get ( i ) returns the i-th element of SetOfPoints. The most important function used by DBSCAN is Expand Cluster which is presented below: ExpandCluster(SetOfPoints, Point, ClId, Eps, MinPts) : Boolean; seeds:=SetOfPoints.regionQuery(Point,Eps); IF seeds.size<MinPts THEN // no core point SetOfPoint.changeClId(Point,NOISE); RETURN False ELSE // all points in seeds are density// reachable from Point SetOfPoints.changeC1Ids(seeds,ClId); seeds.delete(Point); WHILE seeds <> Empty DO curzentP := seeds.first(); result := SetOfPoints.regionQuery(currentP, EPS); IF re.sult.size Z-= MinPts THEN FOR i FROM 1 TO result.size DO resultP := result.get(i); IF resultP.ClId IN WNCLASSIFIED, NOISE} THEN IF resultP.ClId = UNCLASSIFIED THEN seeds.append(resultP); END IF; SetOfPoints.changeC1Id(resultP,ClId); END IF; // UNCLASSIFIED or NOISE END FOR: END IF; // result.size >= MinPts seeds.delete(currentP);
END WHILE; // seeds is Empty RETURN True; END IF END; // ExpandCluster A call of SetOfPoints.regionQuery (Point, Eps ) returns the Eps-Neighborhood of Point in SetOf Points as a list of points. Region queries can be supported efficiently by spatial access methods.
OPTICS
Let p and o be objects from a database D, let N o) be the -neighborhood of ( o,and let MinPts be a natural number.Then, the reachability-distance of p with respect to o is defined as reachability-distance,MinPts(p, o) =
Intuitively, the reachability-distance of an object p with respect to another object o is the smallest distance such that p is directly density-reachable from o if o is a core object. In this case, the reachability-distance cannot be smaller than the coredistance of o because for smaller distances no object is directly density-reachable from o. Otherwise, if o is not a core object, even at the generating distance , the reachability-distance of p with respect to o is UNDEFINED. The reachability-distance of an object p depends on the core object with respect to which it is calculated. Figure 5 illustrates the notions of core-distance and reachability-distance. Our algorithm OPTICS creates an ordering of a database, additionally storing the core-distance and a suitable reachability-distance for each object. We will see that this information is sufficient to extract all density-based clusterings with respect to any distance which is smaller than the generating distance from this order.
Figure 5 illustrates the main loop of the algorithm OPTICS.At the beginning, we open a file OrderedFile for writing and close this file after ending the loop. Each object from a database SetOfObjects is simply handed over to a procedure ExpandClusterOrder if the object is not yet processed. The reachability-distance for each object in the set neighbors is determined with respect to the center-object CenterObject. Objects which are not yet in the priority-queue OrderSeeds are simply inserted with their reachability-distance. Objects which are already in the queue are moved further to the top of the queue if their new reachability-distance is smaller than their previous reachability-distance. Due to its structural equivalence to the algorithm DBSCAN, the run-time of the algorithm OPTICS is nearly the same as the runtime for DBSCAN. We performed an extensive performance test using different data sets and different parameter settings. It simply turned out that the run-time of OPTICS was almost constantly 1.6 times the run-time of DBSCAN. This is not surprising since the run-time for OPTICS as well as for DBSCAN is heavily dominated by the run-time of the -neighborhood queries which must be performed for each object in the database, i.e. the run-time for both algorithms is O(n * run-time of an -neighborhood query). To retrieve the -neighborhood of an object o, a region query with the center o and the radius is used. Without any index support, to answer such a region query, a scan through the whole database has to be performed. In this case, the runtime of OPTICS would be O(n2). If a tree-based spatial index can be used, the run-time is reduced to O (n log n) since region queries are supported efficiently by spatial access methods such as the R*-tree [BKSS 90] or the X-tree [BKK 96] for data from a vector space or the M-tree [CPZ 97] for data from a metric space.The height of such a tree-based index is O(log n) for a database of n objects in the worst case and, at least in low-dimensional spaces, a query with a small query region has to traverse only a limited number of paths. Furthermore, if we have a direct access to the neighborhood, e.g. if the objects are organized in a grid, the run-time is further reduced to O(n) because in a grid the complexity of a single neighborhood query is O(1). Having generated the augmented cluster-ordering of a database with respect to and MinPts, we can extract any density-based clustering from this order with respect to MinPts and a clustering-distance by simply scanning the cluster-ordering
and assigning cluster-memberships depending on the reachability-distance and the core-distance of the objects. Figure 8 depicts the algorithm ExtractDBSCANClustering which performs this task. We first check whether the reachabilitydistance of the current object Object is larger than the clustering-distance . In this case, the object is not density-reachable with respect to and MinPts from any of the objects which are located before the current object in the cluster-ordering. This is obvious, because if Object had been density-reachable with respect to and MinPts from a preceding object in the order, it would have been assigned a reachability-distance of at most . Therefore, if the reachability-distance is larger than , we look at the core-distance of Object and start a new cluster if Object is a core object with respect to and MinPts; otherwise, Object is assigned to NOISE (note that the reachability-distance of the first object in the cluster-ordering is always UNDEFINED and that we assume UNDEFINED to be greater than any defined distance). If the reachability-distance of the current object is smaller than , we can simply assign this object to the current cluster because then it is density-reachable with respect to and MinPts from a preceding core object in the cluster-ordering. The clustering created from a cluster-ordered data set by ExtractDBSCAN-Clustering is nearly indistinguishable from a clustering created by DBSCAN. Only some border objects may be missed when extracted by the algorithm ExtractDBSCANClustering if they were processed by the algorithm OPTICS before a core object of the corresponding cluster had been found. However, the fraction of such border objects is so small that we can omit a postprocessing (i.e. reassign those objects to a cluster) without much loss of information. To extract different density-based clusterings from the clusterordering of a data set is not the intended application of the OPTICS algorithm. That an extraction is possible only demonstrates that the cluster-ordering of a data set actually contains the information about the intrinsic clustering structure of thatdata set (up to the generating distance ). This information can be analyzed much more effectively by using other techniques which are presented in the next section.
Identifying The Clustering Structure
The OPTICS algorithm generates the augmented cluster-ordering consisting of the ordering of the points, the reachability-values and the core-values. However, for the following interactive and automatic analysis techniques only the ordering and the reachability-values are needed. To simplify the notation, we specify them formally: Definition (results of the OPTICS algorithm) Let DB be a database containing n points. The OPTICS algorithm generates an ordering of the points o:{1..n} DB and corresponding reachability-values r:{1..n}R0 The visual techniques presented below fall into two main categories. First, methods to get a general overview of the data. These are useful for gaining a high-level understanding of the way the data is structured. It is important to see most or even all of the data at once, making pixel-oriented visualizations the method of choice. Second, once the general structure is understood, the user is interested in zooming into the most interesting looking subsets. In the corresponding detailed view, single (small or large) clusters are being analyzed and their relationships examined. Here it is important to show the maximum amount of information which can easily be understood. Thus, we present different techniques for these two different tasks. Because the detailed technique is a direct graphical representation of the cluster-ordering, we present it first and then continue with the high-level technique. A totally different set of requirements is posed for the automatic techniques. They are used to generate the intrinsic cluster structure automatically for further (automatic) processing steps.
SNAPSHOTS
DATASET:
This dataset is a realtime dataset. This is the data set used in the program for the execution of the source code for the required output. 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,0,0,0,24.8,0,6.5,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,1.8,0,24,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,0,2.2,0,0,0,0
1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 2,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,6.2,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,7,0,0,0,0,4.6,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,0,0,0,0,0,8,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,16,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,5.8,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,5,0,0,0,0,11.7,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,0,0,10.1,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,11,32.6,12,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,5.6,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,13,1,0,0,0,0,0
1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,0,0,20,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,0,0,3,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,0,0,0,0,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,0,0,0,14,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,0,10.3,0,2,0,0,0 1999 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,140,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,6.2,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,34.6,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,1.5,0,0,0,0,4.8,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,16,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,2,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,2.5,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,12,18,2,0,0,0,0
2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,3,0,10,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,6,0,16.6,6.2,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,8,1.8,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,0,0,67.2,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,0,0,1.2,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,0,0,0,6,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,47,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,21,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,2,0,0,10,6.5,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,0,0,3,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,4.4,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,0,0,0,8.5,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,0,0,0,0,0,0,0 2000 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,0,0,0,0,0,0,0,0
2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,49.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,1.5,0,25.8,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,1.8,14.7,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,0.6,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,0,0.2,3.5,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,1,13,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,9,0,3,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,0,0.5,0,2.4,23,21,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,0,10.2,30.4,0,9,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,1.6,0,2,1.6,0,0,6.2,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,33.6,0.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,1.4,0,6.1,1.2,0,4,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0.4,0,1.2,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,3,11.5,1,2.2,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,10,0,0,0.8,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,17,0,1.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,0,0,0,0,0,0
2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,1.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,0,12.2,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,12.7,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,13.2,8.2,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,18.2,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,4.2,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,1,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,10.6,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,1.5,0,25.8,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,1.8,14.7,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,0.6,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,0,0.2,3.5,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,1,13,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,9,0,3,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,0,0.5,0,2.4,23,21,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,0,10.2,30.4,0,9,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,1.6,0,2,1.6,0,0,6.2,0,0
2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,33.6,0.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,1.4,0,6.1,1.2,0,4,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0.4,0,1.2,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,3,11.5,1,2.2,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,10,0,0,0.8,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,17,0,1.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,1.4,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,0,12.2,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,12.7,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,13.2,8.2,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,18.2,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,4.2,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,1,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,10.6,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,0,0,0,0,0,0,0 2001 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,0,2.1,0,0,0
2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,0,1,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,0,0,28.4,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,0,12.5,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,0,0,0.8,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,8,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,0,1,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,3.2,0,0,0,5.4,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,0,0,0.8,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,17.5,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,2,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,0,9.6,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,0,0,0,4.4,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,0,14.8,3.8,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,0,13.5,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,23,0,5.6,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,20,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,0,0,0,0,0,0
2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,0,0,0.8,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,0,7,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,0,0,5.8,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,2.2,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,0,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,3,0,0,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,6.2,29,0,2.3,0,0,0,0 2002 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,5.6,0,0,10.8,0,0,0,10.4 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,2.6,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,22.2,56.6,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,8.6,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,0.8,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,5.4,37.2,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,0,3.4,4,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,0,0,0,0,15.8,65.8,0,0,0,0
2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,0,5.4,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,1,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,15.4,0,6.2,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,0,0,5.8,6,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,32.2,0,0,0,0,18,0,2,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,9.8,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,30,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,1.4,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,2.4,1,0,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,0,0.8,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,3,6.2,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,0,0,31,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,0,1,8,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,0,19.4,3.6,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,0,21.4,3.4,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,0,1,2,0,0,0,0 2003 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,1,0,0,0,0,0
2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,1.2,0,0,22,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,0,0,12.8,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,0,0.6,0,2,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,28,13.5,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,3,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,17,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,0,16,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,0,0,18,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,0,0,0,0,0,5.4,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,0,0,0,0,0,13.1,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,0,0,3.5,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,27.2,0,0,5,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,12.4,0,11,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,0,44.5,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,0,0,2,9,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,0,0,6,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,0,0,0,10,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0
2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,0,0,2,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,0,0,31,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,0,5.2,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,0,7,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,0,0,0,0,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,0,0,0,3,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,0,32,0,0,0,0,0 2004 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,14,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 1 0,0,0,0,0,0,1.8,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 2 0,0,0,0,0,0,0,16,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 3 0,0,0,0,0,0,2.4,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 4 0,0,0,0,0,0,40,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 5 0,0,0,0,0,0,2.6,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 6 0,0,0,0,0,0,25,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 7 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 8 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 9 0,0,0,0,0,13.6,0,0,0,0,0,0
2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 10 0,0,3.2,0,0,0,0,0,18.4,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 11 0,0,5,0,0,0,0,0,18.6,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 12 0,0,0,0,0,0,0,0,4,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 13 0,0,0,0,0,0,0,0,45,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 14 0,0,0,0,0,8,0,1,13,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 15 0,0,0,0,0,0,0,0,5.8,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 16 0,0,0,0,0,0,19,0,0.4,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 17 0,0,0,0,0,0,0,18.8,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 18 0,0,0,0,0,0,46.6,22,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 19 0,0,0,0,0,0,0,14,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 20 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 21 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 22 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 23 0,0,0,0,0,0,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 24 0,0,0,0,0,9,0,0,55.8,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 25 0,0,0,0,0,0,0,0,32.2,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 26 0,0,0,0,0,3.4,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 27 0,0,0,0,0,6,0,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 28 0,0,0,4,0,1,3,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 29 0,0,0,16,0,1.2,10,0,0,0,0,0
2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,9.2,6,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,2.4,0,0,0,0,0
Conclusions
In this paper, we proposed a cluster analysis method based onthe OPTICS algorithm. OPTICS computes an augmented cluster-ordering of the database objects. The main advantage of our approach, when compared to the clustering algorithms proposed in the literature, is that we do not limit ourselves to one global parameter setting. Instead, the augmented cluster-ordering contains information which is equivalent to the densitybased clusterings corresponding to a broad range of parameter settings and thus is a versatile basis for both automatic and interactive cluster analysis. We demonstrated how to use it as a stand-alone tool to get insight into the distribution of a data set. Depending on the size of the database, we either represent the cluster-ordering graphically (for small data sets) or use an appropriate visualization technique (for large data sets). Both techniques are suitable for interactively exploring the clustering structure, offering additional insights into the distribution and correlation of the data. We also presented an efficient and effective algorithm to automatically extract not only traditional clustering information but also the intrinsic, hierarchical clustering structure. There are several opportunities for future research. For very high-dimensional spaces, no index structures exist to efficiently support the hypersphere range queries needed by the OPTICS algorithm. Therefore it is infeasible to apply it in its current form to a database containing several million high-dimensional objects. Consequently, the most interesting question is whether we can modify OPTICS so that we can trade-off a limited amount of accuracy for a large gain in efficiency. Incrementally managing a cluster-ordering when updates on the database occur is another interesting challenge. Although there are techniques to update a flat density-based decomposition [EKS+ 98] incrementally, it is not obvious how to extend these ideas to a density-based cluster-ordering of a data set.
References
[AGG+ 98] Agrawal R., Gehrke J., Gunopulos D., Raghavan P.:Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, Proc. ACM SIGMOD98 Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105.[AKK 96] Ankerst M., Keim D. A., Kriegel H.-P.: CircleSegments':A
Technique for Visually Exploring Large Multidimensional Data Sets, Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996. [BKK 96] Berchthold S., Keim D., Kriegel H.-P.: The X-Tree: An Index Structure for HighDimensional Data, 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. [BKSS 90] Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, ACM Press, New York, 1990, pp. 322-331. [CPZ 97] Ciaccia P., Patella M., Zezula P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces, Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 426-435. [EKSX 96] Ester M., Kriegel H.-P., Sander J., Xu X.: A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231. [EKS+ 98] Ester M., Kriegel H.-P., Sander J., Wimmer M., Xu X.: Incremental Clustering for Mining in a Data Warehousing Environment, Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 323-333. [EKX 95] Ester M., Kriegel H.-P., Xu X.: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, in: Lecture Notes in Computer Science, Vol. 951, Springer, 1995, pp. 67-82. [GM 85] Grossman A., Morlet J.: Decomposition of functions into wavelets of constant shapes and related transforms. Mathematics and Physics: Lectures on Recent Results, World Scientific, Singapore, 1985. [GRS 98] Guha S., Rastogi R., Shim K.: CURE: An Efficient Clustering Algorithms for Large Databases, Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84. [HK 98] Hinneburg A., Keim D.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proc. 4th Int. Conf. on Knowledge Discovery & Data Mining, New York City, NY, 1998.
[HT 93] Hattori K., Torii Y.: Effective algorithms for the nearest neighbor method in the clustering problem, Pattern Recognition, 1993, Vol. 26, No. 5, pp. 741-746. [Hua 97] Huang Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining, Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997. [JD 88] Jain A. K., Dubes R. C.: Algorithms for Clustering Data, Prentice-Hall, Inc., 1988. [Kei 96a] Keim D. A.: Pixel-oriented Database Visualizations, in: SIGMOD RECORD, Special Issue on Information Visualization, Dezember 1996. [Kei 96b] Keim D. A.: Databases and Visualization, Proc. Tutorial ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, 1996, p. 543. [KN 96] Knorr E. M., Ng R.T.: Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining, IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, December 1996, pp. 884-897. [KR 90] Kaufman L., Rousseeuw P. J.: Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 1990. [Mac 67] MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations, 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, pp. 281-297. [NH 94] Ng R. T., Han J.: Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. [PTVF 92] Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. P.: Numerical Recipes in C, 2nd ed., Cambridge University Press, 1992. [Ric 83] Richards A. J.: Remote Sensing Digital Image Analysis. An Introduction, 1983, Berlin, Springer Verlag. [Sch 96] Schikuta E.: Grid clustering: An efficient hierarchical clustering method for very large data sets. Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105. [SE 97] Schikuta E., Erhart M.: The bang-clustering system: Grid-based data analysis. Proc. Sec. Int. Symp. IDA-97,
Vol. 1280 LNCS, London, UK, Springer-Verlag, 1997. [SCZ 98] Sheikholeslami G., Chatterjee S., Zhang A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases, Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 428 - 439. [Sib 73] Sibson R.: SLINK: an optimally efficient algorithm for the single-link cluster method.The Comp. Journal, Vol. 16, No. 1, 1973, pp. 30-34. [ZRL 96] Zhang T., Ramakrishnan R., Linvy M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp.103-114.

Gokaraju Rangaraju Institute of Engineering and Technology

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Gokaraju Rangaraju Institute of Engineering and Technology

Hochgeladen von

Copyright:

Verfügbare Formate

Departmet of Computer Science and Engineering Gokaraju Rangaraju Institute of Engineering and Technology

(Guide) Prof.Beena Bethel

(Head of Department) Dr.K.Anuradha

INTRODUCTION OF DATA MINING AND TECHNIQUES

What is Data Mining ?

A Density Based Notion of Clusters

Figure 2: core points and border points

notion of density-connectivity borderpoints.

4. DBSCAN: Density Based Spatial Clustering of Applications with Noise

4.1 The Algorithm

Identifying The Clustering Structure

2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 30 0,0,0,0,0,9.2,6,0,0,0,0,0 2005 AJMERA,AJMER,RAJASTHAN,108,26.27,N,74.37,E 31 0,0,0,0,0,0,2.4,0,0,0,0,0

Das könnte Ihnen auch gefallen