Researches Advanced in Clustering Algorithms

Authors

  • Guohuan Feng
  • Junchen Lin
  • Keyi Wang

DOI:

https://doi.org/10.54097/hset.v16i.2498

Keywords:

Clustering; unsupervised learning; feature representation.

Abstract

Clustering is a technique to find the intrinsic structure between data and is a fundamental problem in many data-driven application fields. Currently, clustering is generally modeled as an unsupervised learning task, aiming to mine similar features between different samples and cluster samples with similar features into clusters. Ideally, objects in the same cluster are expected to be similar in the clustering results, while objects in different clusters are quite different. This study summarizes the research status of clustering algorithms in recent years. Specifically, the relevant critical steps of clustering algorithms are first introduced. From two aspects of partition and hierarchical clustering, representative clustering algorithms such as K-means, K-medoids, CLARANS, BIRCH, DBSCAN, and CURE are further detailed. This study also analyzes and summarizes the above algorithms in terms of critical technologies, algorithm ideas, benefits, and shortcomings and compares the distance accuracy of different algorithms on standard data sets. The above work will provide a valuable reference for cluster analysis and data mining research.

Downloads

Download data is not yet available.

References

Han J, Kamber M, Pei J: 1 - Introduction, Han J, Kamber M, Pei J, editor, Data Mining (Third Edition), Boston: Morgan Kaufmann, 2012: 1-38.

Lam D, Wunsch D C. Clustering [J]. Academic Press Library in Signal Processing, 2014, 1: 1115-1149.

Nagpal A, Jatain A, Gaur D. Review based on data clustering algorithms[C]. 2013 IEEE conference on information & communication technologies, 2013: 298-303.

Macqueen J. Classification and analysis of multivariate observations[C]. 5th Berkeley Symp. Math. Statist. Probability, 1967: 281-297.

Ng R T, Han J. CLARANS: A method for clustering objects for spatial data mining [J]. IEEE transactions on knowledge and data engineering, 2002, 14(5): 1003-1016.

Fix E, Hodges J L. Discriminatory analysis. Nonparametric discrimination: Consistency properties [J]. International Statistical Review/Revue Internationale de Statistique, 1989, 57(3): 238-247.

Piryonesi S M, El-Diraby T. Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition [J]. Journal of Infrastructure Systems, 2020, 26.

Kostinakis K, Morfidis K, Demertzis K, et al. Classification of Buildings' Potential for Seismic Damage by Means of Artificial Intelligence Techniques[J]. arXiv preprint arXiv:2205.01076, 2022.

Boshnakov G N. Introduction to Time Series Analysis and Forecasting, Wiley Series in Probability and Statistics, by Douglas C. Montgomery, Cheryl L. Jennings and Murat Kulahci (eds). Published by John Wiley and Sons, Hoboken, NJ, USA, 2015. Total number of pages: 672 Hardcover: ISBN: 978-1-118-74511-3, ebook: ISBN: 978-1-118-74515-1, etext: ISBN: 978-1-118-74495-6: Wiley Online Library, 2016.

Murtagh F. A survey of recent advances in hierarchical clustering algorithms[J]. The computer journal, 1983, 26(4): 354-359.

Ester M, Kriegel H-P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. kdd, 1996: 226-231.

Rand W M. Objective Criteria for the Evaluation of Clustering Methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846-850.

Saxena A, Prasad M, Gupta A, et al. A review of clustering techniques and developments[J]. Neurocomputing, 2017, 267: 664-681.

Olson D L, Delen D. Advanced data mining techniques[M]. Springer Science & Business Media, 2008.

Powers D M. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. arXiv preprint arXiv:2010.16061, 2020.

Xu D, Tian Y. A comprehensive survey of clustering algorithms[J]. Annals of Data Science, 2015, 2(2): 165-193.

Downloads

Published

10-11-2022

How to Cite

Feng, G., Lin, J., & Wang, K. (2022). Researches Advanced in Clustering Algorithms. Highlights in Science, Engineering and Technology, 16, 168-177. https://doi.org/10.54097/hset.v16i.2498