Approximation to the K-Means Clustering Algorithm using PCA

Sathyendranath Malli; Nagesh H. R.; B. Dinesh Rao

Research Article

Approximation to the K-Means Clustering Algorithm using PCA

by Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 175 - Issue 11

Published: Aug 2020

Authors: Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao

10.5120/ijca2020920605

PDF

Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao . Approximation to the K-Means Clustering Algorithm using PCA. International Journal of Computer Applications. 175, 11 (Aug 2020), 43-46. DOI=10.5120/ijca2020920605

                        @article{ 10.5120/ijca2020920605,
                        author  = { Sathyendranath Malli,Nagesh H. R.,B. Dinesh Rao },
                        title   = { Approximation to the K-Means Clustering Algorithm using PCA },
                        journal = { International Journal of Computer Applications },
                        year    = { 2020 },
                        volume  = { 175 },
                        number  = { 11 },
                        pages   = { 43-46 },
                        doi     = { 10.5120/ijca2020920605 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2020
                        %A Sathyendranath Malli
                        %A Nagesh H. R.
                        %A B. Dinesh Rao
                        %T Approximation to the K-Means Clustering Algorithm using PCA%T 
                        %J International Journal of Computer Applications
                        %V 175
                        %N 11
                        %P 43-46
                        %R 10.5120/ijca2020920605
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Healthcare is an emerging domain that produces data exponentially. These massive data contain a wide variety of fields, which lead to a problem in analyzing the information. Clustering is a popular method for analyzing data. Data is split into smaller clusters having similar properties and is then analyzed. The K-Means algorithm [1] is a well-known technique among clustering methods. In this paper, an efficient approximation to the K-means problem targeted for large data by reducing the number of features to one through Principle Component Analysis(PCA) is introduced. This data is clustered in one dimension using the K - means algorithm. Intra-cluster RMS error in the modified algorithm is compared with the K-means algorithm in m dimensions and is found to be reasonable. The time taken by the modified algorithm is significantly less when compared to the K - means algorithm.

References

S.P Lloyd, Least Squares quantization in PCM, IEEE trans. Inf. Theory 28(2) (1982) 129-136
D. Arthur, S. Vassilvitskii, k-Meansþ þ: the advantages of careful seeding, in ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007, pp. 1027–1035
Hotelling H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520.
Marco Capóa; An efficient approximation to the K -means clustering for massive data, Knowledge-Based Systems 117 (2017) 56–69
Grigorios Tzortzis n; The MinMax k-Means clustering algorithm, Pattern Recognition 47(2014)2505–2516
Jing Wang; Fast Approximate k-Means via Cluster Closures, 978-1-4673-1228-8/12/2012 IEEE.
Hassan Ismkhan; I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means, Pattern Recognition 79 (2018) 402–413
M. E. Celebi, Hassan A, Patricio; A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications 40 (2013) 200–210
Amir Ahmad, Lipika Dey, A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63 (2007) 503–527
Han Xiao, Kashif Rasul, Roland Vollgraf; Fashion- MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, https://www.researchgate.net/publication/319312259, 2017
S. Sieranoja and P. Fränti, "Fast and general density peaks clustering", Pattern Recognition Letters, 128, 551-558, December 2019
D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177–1178. ACM, 2010.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

K-means RMS error PCA Approximation.