13 views

Uploaded by minichel

- Cluster
- SURVEY Kmeans
- STSCI4060-HW5
- Clustering Dichotomous Data for Health Care
- Clustering of Time Series Data-A Survey
- 0013_12_FUZZY_RENDSZEREK_SZAMITASI_INTELLIGENCIA_B.pdf
- Integrating Clustering with Different Data Mining Techniques in the Diagnosis of Heart Disease
- 171205215.pdf
- Cell Inter Spss
- MITProfessionalX DS Course Syllabus for Data Science v2
- ICAET-T1-14-172
- DYNAMIC K-MEANS ALGORITHM FOR OPTIMIZED ROUTING IN MOBILE AD HOC NETWORKS
- Abdominal
- IEEE Internet
- IEEE_IP08
- Exam With Model Answers
- Drop Points Placement Algorithms for Layered Network Planning
- A Survey on localization in Wireless Sensor Network by Angle of Arrival
- 6 SasMiner Clustering
- plantdisease.pdf

You are on page 1of 3

k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering

problem. The procedure follows a simple and easy way to classify a given data set through a certain number

of clusters (assume k clusters) fixed apriori. The main idea is to define k centers, one for each cluster. These

centers should be placed in a cunning way because of different location causes different result. So, the

better choice is to place them as much as possible far away from each other. The next step is to take each

point belonging to a given data set and associate it to the nearest center. When no point is pending, the first

step is completed and an early group age is done. At this point we need to re-calculate k new centroids as

barycenter of the clusters resulting from the previous step. After we have these k new centroids, a new binding

has to be done between the same data set points and the nearest new center. A loop has been generated. As a

result of this loop we may notice that the k centers change their location step by step until no more changes

are done or in other words centers do not move any more. Finally, this algorithm aims at minimizing an

objective function know as squared error function given by:

where,

||xi - vj|| is the Euclidean distance between xi and vj.

ci is the number of data points in ith cluster.

c is the number of cluster centers.

Let X = {x1,x2,x3,..,xn} be the set of data points and V = {v1,v2,.,vc} be the set of centers.

1) Randomly select c cluster centers.

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluste

centers..

4) Recalculate the new cluster center using:

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step 3).

Advantages

1) Fast, robust and easier to understand.

2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each object, and t is #

iterations. Normally, k, t, d << n.

3) Gives best result when data set are distinct or well separated from each other.

Note: For more detailed figure for k-means algorithm please refer to k-means figure sub page.

Disadvantages

1) The learning algorithm requires apriori specification of the number of cluster centers.

2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means will not be able to

resolve

that there are two clusters.

3) The learning algorithm is not invariant to non-linear transformations i.e. with different representation of data

we get

different results (data represented in form of cartesian co-ordinates and polar co-ordinates will give different

results).

4) Euclidean distance measures can unequally weight underlying factors.

5) The learning algorithm provides the local optima of the squared error function.

6) Randomly choosing of the cluster center cannot lead us to the fruitful result. Pl. refer Fig.

7) Applicable only when mean is defined i.e. fails for categorical data.

8) Unable to handle noisy data and outliers.

9) Algorithm fails for non-linear data set.

Fig II: Showing the non-linear data set where k-means algorithm fails

- ClusterUploaded byAmol Rattan
- SURVEY KmeansUploaded bySabitha Durai
- STSCI4060-HW5Uploaded byLei Huang
- Clustering Dichotomous Data for Health CareUploaded byMandy Diaz
- Clustering of Time Series Data-A SurveyUploaded byperl.freak
- 0013_12_FUZZY_RENDSZEREK_SZAMITASI_INTELLIGENCIA_B.pdfUploaded bynoorahmed1991
- Integrating Clustering with Different Data Mining Techniques in the Diagnosis of Heart DiseaseUploaded byJournal of Computer Science and Engineering
- 171205215.pdfUploaded byhuevonomar05
- Cell Inter SpssUploaded byNeha Thakur
- MITProfessionalX DS Course Syllabus for Data Science v2Uploaded bydanobikwelu_scribd
- ICAET-T1-14-172Uploaded byKarthik R
- DYNAMIC K-MEANS ALGORITHM FOR OPTIMIZED ROUTING IN MOBILE AD HOC NETWORKSUploaded byijcses
- AbdominalUploaded byadrianoedf
- IEEE InternetUploaded bytechnetvn
- IEEE_IP08Uploaded bysumbal_iqbal
- Exam With Model AnswersUploaded byjjia048
- Drop Points Placement Algorithms for Layered Network PlanningUploaded bySEP-Publisher
- A Survey on localization in Wireless Sensor Network by Angle of ArrivalUploaded byIJIRST
- 6 SasMiner ClusteringUploaded byFredy Vivanco
- plantdisease.pdfUploaded byJoyce Ann Martes
- Using Topographic and Soils Data to Understand and Predict FieldUploaded bymth2610
- IRJET-Analysis of Clustering Algorithms for Data Management in Sensor-CloudUploaded byIRJET Journal
- Model-based Clustering and Typologies in the.pdfUploaded byRafaela Sturza
- 04227366Uploaded byMekaTron
- Consumer BehaviourUploaded bySaurabh Wadhwa
- 04426662Uploaded by_NUKE_
- It 6702 Data Warehousing and Data Mining Question BankUploaded bysudhakarm13
- Pirooznia_2008_PMID18366602Uploaded bySalvador Martinez
- A Generic Model for Student Data Analytic Web Service (SDAWS)Uploaded byATS
- grading for fruitsUploaded byHKCHETHAN

- costQuality.pdfUploaded byminichel
- Face Recognition 2nd presentationUploaded byminichel
- Data Mining Practice Final SolUploaded byminichel
- Domain Area PUploaded byminichel
- Chapter11 -2Uploaded byminichel
- Facebook-Distributed-System-Case-Study-For-Distributed-System-Inside-Facebook-Datacenters.pdfUploaded byminichel
- W1L2-Complexity.pdfUploaded byminichel
- Example CodeUploaded byminichel
- IoT Assignment.docxUploaded byminichel
- Final exam review.docUploaded byminichel
- LicenseUploaded byminichel
- tabu_TSPUploaded byBedadipta Bain
- MPLUploaded byminichel
- Chapter-5Uploaded byminichel
- NLP steemerUploaded byminichel
- Cloud ComputingUploaded byminichel
- asstqaUploaded byminichel
- Pervasive ComputingUploaded bysavidasan
- Pervasive ComputingUploaded bysavidasan
- STQA Assigment 2Uploaded byminichel
- Expert System to Suggest a Natural Drink to-1Uploaded byminichel
- History IsUploaded byminichel
- IoT AssignmentUploaded byminichel
- STQA2 AssignmentUploaded byminichel
- wubCPPUploaded byminichel
- 10 Facts on the Relationship Between a Language and Culture for an English ProjectUploaded byminichel
- Cybersafety Basics(1)Uploaded byminichel
- horowitz-and-sahani-fundamentals-of-computer-algorithms-2nd-edition.pdfUploaded byminichel
- LcmUploaded byminichel

- Importance–Performance Analysis by Fuzzy C-Means AlgorithmUploaded byAhmad Rifa'i
- 06654133Uploaded bytanny
- ImageProcessing-IDRISI-Feb2008Uploaded bynirmal kumar
- machine learningUploaded bySiddharth Jawahar
- SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG DATA ANALYTICSUploaded byijdps
- chap15Uploaded byAmjad Hussain Zahid
- Text_Mining_with_RapidMiner.pdfUploaded byernestoarmijo
- We Ka TutorialUploaded byFasih Sendu
- Optimized Unsupervised Image Classification Based on Neutrosophic Set TheoryUploaded byAnonymous 0U9j6BLllB
- CurrentMethodsInImageSegmentation_Phan2000Uploaded byBahareh Bozorg Chami
- Color SegmantationUploaded bypidaceo
- A Comparison of Shadow Detection Removal and Reconstruction MethodsUploaded byIJSTE
- 07055838Uploaded byKarthick Vijayan
- Algorithms and Models for Network Data and Link Analysis.pdfUploaded bypiyeafj
- Paper_24-An_Approach_of_Improving_Student’s_Academic_Performance_by_using_K-means_clustering_algorithm_and_Decision_treeUploaded byEditor IJACSA
- Text Mining Applications and TheoryUploaded byXiangcao Liu
- ieeeUploaded byRajesh Raj
- forrester stop billions in fraud loss with machine learningUploaded byapi-276414739
- Distributed K-Means Algorithm and Fuzzy C-MeansAlgorithm for Sensor Networks Based OnMultiagent Consensus TheoryUploaded byTechnos_Inc
- Breast Cancer Detection Using ThermographyUploaded byIRJET Journal
- App Store Cluster AnalysisUploaded byAfnan Al-Subaihin
- IJCNCUploaded byAIRCC - IJCNC
- K-meansUploaded byKurushNishanth
- Determination of Typical Load Profile of Consumers Using Fuzzy C-Means Clustering AlgorithmUploaded bykalokos
- [Deep Learning Paper] The Unreasonable Effectiveness of Deep Features as a Perceptual MetricUploaded byAltun Hasanli
- A Review on Brain Tumor Detection using BFCFCM AlgorithmUploaded byIRJET Journal
- Introduction to ML TheoryUploaded byTanat Tonguthaisri
- Data Clustering Theory, Algorithms, and Applications 2007.pdfUploaded byKostas Konstantinidis
- Microbacterium TuberculosisUploaded byprabuddhatripathi
- An Integrated Genetic-Based Model of Naive Bayes Networks for Credit ScoringUploaded byAdam Hansen