Effective Clustering for Large Datasets Using Density-Based Clustering via Message Passing

Siddharth Dixit

Research Article

Effective Clustering for Large Datasets Using Density-Based Clustering via Message Passing

by Siddharth Dixit

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 48

Published: October 2025

Authors: Siddharth Dixit

10.5120/ijca2025925809

PDF

Siddharth Dixit . Effective Clustering for Large Datasets Using Density-Based Clustering via Message Passing. International Journal of Computer Applications. 187, 48 (October 2025), 28-39. DOI=10.5120/ijca2025925809

                        @article{ 10.5120/ijca2025925809,
                        author  = { Siddharth Dixit },
                        title   = { Effective Clustering for Large Datasets Using Density-Based Clustering via Message Passing },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 48 },
                        pages   = { 28-39 },
                        doi     = { 10.5120/ijca2025925809 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Siddharth Dixit
                        %T Effective Clustering for Large Datasets Using Density-Based Clustering via Message Passing%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 48
                        %P 28-39
                        %R 10.5120/ijca2025925809
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Density-based clustering remains a significant area of research in data science, particularly given the increasing prevalence of high-dimensional datasets with varying densities. Many existing clustering approaches struggle to effectively handle datasets that contain regions of high density surrounded by sparse areas. This study introduces a novel clustering algorithm based on the concept of mutual K-nearest neighbor relationships, designed to overcome these limitations. The proposed method requires only a single input parameter, demonstrates strong performance on high-dimensional, density-based datasets, and is computationally efficient. Furthermore, the algorithm’s practical applications are illustrated through its potential to enhance search and retrieval processes within vector databases.

References

X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Mo- toda, G. J. McLachlan, et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Library of Congress, 2006.
P. Tan, M. Steinbach, and V. Kumar, “Data mining cluster analysis: Basic concepts and algorithms,” 2013.
Z. Hu and R. Bhatnagar, “Clustering algorithm based on mutual k-nearest neighbor relationships,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 5, no. 2, pp. 100–145, 2012.
D. Sardana and R. Bhatnagar, “Graph clustering using mutual k-nearest neighbors,” in Active Media Technology, pp. 35–48, Springer International Publishing, 2014.
L. Ertoz, M. Steinbach, and V. Kumar, “A new shared nearest neighbor clustering algorithm and its applications,” in Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining, pp. 105–115, Apr. 2002.
M. A. Wong and T. Lane, “A kth nearest neighbour clustering procedure,” in Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, pp. 308–311, Springer US, Jan. 1981.
H. Kriegel et al., “Density-based clustering,” Wiley Interdis- ciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 231–240, 2011.
L. Ertöz, M. Steinbach, and V. Kumar, “Finding clusters of different sizes, shapes, and densities in noisy, high dimen- sional data,” in SDM, 2003.
C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising behavior of distance metrics in high dimensional space. Springer Berlin Heidelberg, 2001.
B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007.
Z. Hu, Multi-Domain Clustering on Real-Valued Datasets. PhD thesis, University of Cincinnati, 2011. https://etd. ohiolink.edu/.
M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in KDD Workshop on Text Mining, vol. 400, 2000.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Clustering; Mutual 𝑘-Nearest Neighbor; Density- Based Methods; Outlier Detection; Vector Databases; Data Mining