A Comprehensive Study on the Application of Data Visualization in Feature Selection

Authors

  • Ziyi Fu

DOI:

https://doi.org/10.54097/xm294797

Keywords:

Data Visualization; Features; Model; Datasets; Data Analysis.

Abstract

Feature selection is a critical step in machine learning and data analysis, where reducing the dimensionality of data helps to enhance model performance and interpretability. This paper provides a comprehensive review of the application of various data visualization techniques in feature selection, such as scatter plots, heatmaps, parallel coordinates, etc. These visual methods not only help in identifying key patterns but also aid in assessing the relevance and interactions between features. Visualization provides a clearer understanding of the data, facilitating more informed decisions during the feature selection process. Despite its importance, gaps remain in the existing literature, particularly regarding the scalability of visualization tools for large datasets. This review highlights these challenges and suggests potential research directions, including the development of advanced visualization methods tailored to complex datasets. Overall, the findings of this paper offer valuable insights for improving feature selection practices, leading to more efficient and accurate machine learning models to do better data analysis.

Downloads

Download data is not yet available.

References

[1] Kira, K., & Rendell, L. A. A practical approach to feature selection. In Proceedings of the Ninth International Conference on Machine Learning (pp. 249-256). 1992. DOI: https://doi.org/10.1016/B978-1-55860-247-2.50037-1

[2] Wilkinson, L. The Grammar of Graphics. Springer, 2005. (https://link.springer.com/book/10.1007/0-387-28695-0)

[3] Becker, R. A., & Cleveland, W. S. Brushing scatterplots. Technometrics, 29(2), 127-142, 1987. DOI: https://doi.org/10.1080/00401706.1987.10488204

[4] Feng, J., & Bian, J. Data visualization techniques in healthcare: A survey. Journal of Biomedical Informatics, 60, 170-183, 2016. (https://www.sciencedirect.com/science/article/pii/S1532046416301050)

[5] Nair, V., & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML). 2010.

[6] Heer, J., & Bostock, M. Crowdsourcing graphical perception: Using Mechanical Turk to assess the quality of charts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2010. DOI: https://doi.org/10.1145/1753326.1753357

[7] Borkin, M. A., Vo, A. A., & Hsueh, H. Evaluation of data visualizations for personal health monitoring. In Proceedings of the 2013 ACM Annual Conference on Human Factors in Computing Systems. 2013.

[8] Liu, H., & Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502, 2005. DOI: https://doi.org/10.1109/TKDE.2005.66

[9] Tufte, E. R. The Visual Display of Quantitative Information. Graphics Press, 2001. (https://www.edwardtufte.com/tufte/visual_display)

[10] Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole, 1983.

[11] Jolliffe, I. T. Principal Component Analysis. Springer Series in Statistics, 2002.

[12] Cleveland, W. S. Visualizing Data. Hobart Press, 1993. (https://www.wscleveland.com/)

[13] Abdi, H., & Williams, L. J. Principal component analysis. In Wiley Encyclopedia of Research Design. 2010. DOI: https://doi.org/10.1002/wics.101

[14] Van Der Maaten, L., & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605, 2008.

Downloads

Published

18-02-2025

How to Cite

Fu, Z. (2025). A Comprehensive Study on the Application of Data Visualization in Feature Selection. Highlights in Science, Engineering and Technology, 124, 432-435. https://doi.org/10.54097/xm294797