Dataset Analysis and House Price Prediction
DOI:
https://doi.org/10.54097/1qq43384Keywords:
Data Cleaning; Data Analysis; Machine Learning.Abstract
The prediction of house prices through the analysis of data using machine learning and charts is a crucial and significant topic. Many scholars have conducted research in this area, providing valuable insights for both academic learning and real-world applications. The goal of this study is to predict house prices and thoroughly analyze the dataset. The methodology includes data cleaning techniques to ensure data quality. Additionally, three types of charts are employed to analyze the dataset effectively. Finally, two popular models are utilized to predict house prices, and their accuracy is evaluated. The results indicate that the Random Forest Regressor model is more suitable for the specific dataset, and the impact of each factor on house price prediction varies Looking ahead, future research will involve the utilization of more advanced models to further enhance prediction accuracy. This will enable realistic simulations and contribute to the ongoing development of the society. This study has made preliminary progress in data cleaning, dataset analysis, and predictive modeling. The use of charts provides a more intuitive representation of dataset characteristics. The findings have implications for the fields of data cleaning and machine learning.
Downloads
References
Tings B , Imber J , Kortum K ,et al.1. Data Science and Big Data[C]//Kursreihe „Data Train“.2021.
Dataset. https://www.kaggle.com/datasets/shibumohapatra/house-price
Jan V D B, Cunningham S A, Eeckels R, et al.Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities [J].PLoS Medicine, 2005, 2(10):e267.DOI:10.1371/journal.pmed.0020267. Heatmap. Data can be visually represented using maps or diagrams, where colors are used to represent different data values.
Gao C.[R] Heatmap, and heatmap.2 gave different figures for the same dataset[J]. [2023-09-17].
O'Keefe J J . The Human Scatterplot. [J]. Mathematics Teaching in the Middle School, 1997, 3.
Cox N J. DISTPLOT: Stata module to generate distribution function plot[J]. Statistical Software Components, 2017.
MSE-Forks acquires Reachable Solutions[J]. Modern Materials Handling: Productivity Solutions for Manufacturing, Warehousing and Distribution, 2013(1):68.
Mohammad B N S, Siddiqui K. Random Forest Regressor Machine Learning Model Developed for Mental Health Prediction Based on Mhi-5, Phq-9 and Bdi Scale [J]. SSRN Electronic Journal, 2021.DOI:10.2139/ssrn.3867416. Random Decision Forests. http://vision.cse.psu.edu/seminars/talks/2009/random_tff/odt.pdf.
Sammut C. Random Decision Forests [J]. 2010.
Mahmoud M A. Phase I Analysis of Multiple Linear Regression Profiles [J]. Communications in Statistics - Simulation and Computation, 2008, 37(10): 2106-2130.DOI:10.1080/03610910802305017.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







