Improvement of Data Cleaning and Quality Assessment Methods Under Big Data Environment

Authors

  • Qiao Song
  • Zengren Song

DOI:

https://doi.org/10.54097/gk2jwg31

Keywords:

Big Data, Data Cleaning, Quality Assessment, Outlier Detection, Multi-Dimensional Indicators

Abstract

In the era of big data, the scale and complexity of data have increased dramatically, and data cleaning and quality assessment have become key links to ensure data availability and value. This paper deeply explores the improvement of data cleaning and quality assessment methods under big data environment, analyzes the limitations of traditional methods, and elaborates on various data cleaning technologies, such as outlier detection, missing value processing, duplicate value processing, etc., as well as the construction of a multi-dimensional indicator system for data quality assessment. Through case analysis, the effect of the improved method in practical application is demonstrated, and the future development trend is prospected, aiming to provide strong support for the effective use of big data and promote more accurate decisions based on high-quality data in various fields.

Downloads

Download data is not yet available.

References

[1] Li, F., Min, Y., & Zhang, Y. A review of key technologies for reliability of power lithium batteries based on big data. Energy Storage Science and Technology, Vol. 12(2023) No. 6, p. 1981-1994.

[2] Zhang, C. Big data property - concept analysis, ownership and protection path. Journal of Hangzhou Normal University (Social Science Edition), Vol. 43(2021) No. 1, p. 104-119.

[3] Kuang, J., Zhao, C., Yang, L., Wang, H., & Qian, H. An abnormal data cleaning algorithm based on deep learning. Journal of Electronics and Information Technology, Vol. 44(2022) No. 2, p. 507-513.

[4] Wang, F., Song, H., Sun, X., & Chen, L. Multi-source heterogeneous education big data mining and application platform. Journal of Jilin University (Information Science Edition), Vol. 41(2023) No. 5, p. 922-929.

[5] Lu, F., Wu, C., Chen, X., Zhang, K., & Gui, N. Construction of power energy big data cleaning model based on cloud computing. Automation Instrumentation, Vol. 43(2022) No. 1, p. 72-76.

[6] Gao, F., Song, S., & Wang, J. Time series data cleaning method under multi-interval speed constraints. Journal of Software, Vol. 32(2021) No. 3, p. 689-711.

[7] Liu, Y., Wang, Q., Xu, Z., Liu, Y., He, J., & Han, S. Research on oil dissolved gas data cleaning and anomaly identification method based on multi-layer architecture. Journal of North China Electric Power University (Natural Science Edition), Vol. 49(2022) No. 1, p. 81-89.

[8] Tian, Y., Hong, Z., & Zhou, L. Industrial, commercial and residential user power data cleaning algorithm based on functional data analysis. Electrical Measurement and Instrumentation, Vol. 58(2021) No. 1, p. 11-19.

[9] Wu, X., Ying, Z., Sheng, S., Jiang, T., Bu, C., & Zhang, Z. Data middle platform framework and practice. Big Data, Vol. 9(2023) No. 6, p. 137-159.

[10] Song, H., Du, S., Zhou, Y., Wang, Y., & Wang, J. Big data intelligent platform and application analysis for oil and gas resource development. Journal of Engineering Science, Vol. 43(2021) No. 2, p. 179-192.

[11] Liu, Y., Liu, W., Shi, Y., Zhou, J., & Zhang, Y. Multi-scale cleaning of vibration signals of hydropower units under complex working conditions. Journal of Hydroelectric Engineering, Vol. 41(2022) No. 12, p. 153-162.

[12] Chen, L., Zhou, N., Zhu, P., & Yuan, Y. Dataset for agricultural knowledge graph construction. Journal of Agricultural Big Data, Vol. 6(2024) No. 1, p. 1-8.

Downloads

Published

28-04-2025

Issue

Section

Articles

How to Cite

Song, Q., & Song, Z. (2025). Improvement of Data Cleaning and Quality Assessment Methods Under Big Data Environment. Frontiers in Computing and Intelligent Systems, 12(1), 114-118. https://doi.org/10.54097/gk2jwg31