Comparison and Evaluation of Classical Dimensionality Reduction Methods

Zilin Shen

doi:10.54097/hset.v70i.13890

Authors

Zilin Shen

DOI:

https://doi.org/10.54097/hset.v70i.13890

Keywords:

PCA, t-SNE, UMAP, Dimensionality Reduction.

Abstract

As one of the tasks of unsupervised learning, data dimensionality reduction is faced with the problem of a lack of evaluation methods. Based on this, three classical dimensionality reduction methods such as PCA, t-SNE and UMAP were selected as the research object in this paper. This article selected 5 three-classification datasets and used the three methods mentioned above to perform dimensionality reduction. This paper plotted 3D scatter graphs after dimensionality reduction to analyze the differentiation effect of the data on different categories of the target variable. Then the data after dimensionality reduction was classified using random forest model and the classification accuracy was obtained. According to the 3D scatter plots and the accuracy of random forest, it is found that PCA has a good dimensionality reduction effect on most of the selected datasets, and t-SNE has a relatively stable dimensionality reduction effect. In contrast, UMAP has good dimensionality reduction performance in some individual datasets but lacks stability. Overall, this paper proposes a dimensionality reduction evaluation method that combines scatter-plot visualization results and classification models, which can effectively predict the performances of the dimensionality reduction methods for a variety of datasets, thereby promoting the comparison and selection of dimensionality reduction methods in the field of unsupervised learning.

Downloads

Download data is not yet available.

References

Mahesh B. Machine learning algorithms-a review [J]. International Journal of Science and Research (IJSR). [Internet], 2020, 9 (1): 381-386.

Zebari R, Abdulazeez A, Zeebaree D, et al. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction [J]. Journal of Applied Science and Technology Trends, 2020, 1 (2): 56-70.

Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne) [J]. Computer Science Review, 2021, 40: 100378.

Shlens J. A tutorial on principal component analysis [J]. arXiv preprint arXiv: 1404.1100, 2014.

Van der Maaten L, Hinton G. Visualizing data using t-SNE [J]. Journal of machine learning research, 2008, 9 (11).

Arora S, Hu W, Kothari P K. An analysis of the t-sne algorithm for data visualization [C] // Conference on learning theory. PMLR, 2018: 1455-1462.

McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction [J]. arXiv preprint arXiv: 1802.03426, 2018.

Allaoui M, Kherfi M L, Cheriet A. Considerably improving clustering algorithms using UMAP dimensionality reduction technique: A comparative study [C] // International conference on image and signal processing. Cham: Springer International Publishing, 2020: 317-325.

Boateng E Y, Otoo J, Abaye D A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review [J]. Journal of Data Analysis and Information Processing, 2020, 8 (4): 341-357.

Wang Y S, Xia S T. A Survey of Random Forest Algorithms [J]. Information and communication technologies, 2018, 12 (01): 49-55.