Dataset Analysis and House Price Prediction

Authors

  • Junjie Liu

DOI:

https://doi.org/10.54097/1qq43384

Keywords:

Data Cleaning; Data Analysis; Machine Learning.

Abstract

The prediction of house prices through the analysis of data using machine learning and charts is a crucial and significant topic. Many scholars have conducted research in this area, providing valuable insights for both academic learning and real-world applications. The goal of this study is to predict house prices and thoroughly analyze the dataset. The methodology includes data cleaning techniques to ensure data quality. Additionally, three types of charts are employed to analyze the dataset effectively. Finally, two popular models are utilized to predict house prices, and their accuracy is evaluated. The results indicate that the Random Forest Regressor model is more suitable for the specific dataset, and the impact of each factor on house price prediction varies Looking ahead, future research will involve the utilization of more advanced models to further enhance prediction accuracy. This will enable realistic simulations and contribute to the ongoing development of the society. This study has made preliminary progress in data cleaning, dataset analysis, and predictive modeling. The use of charts provides a more intuitive representation of dataset characteristics. The findings have implications for the fields of data cleaning and machine learning.

Downloads

Download data is not yet available.

References

Tings B , Imber J , Kortum K ,et al.1. Data Science and Big Data[C]//Kursreihe „Data Train“.2021.

Dataset. https://www.kaggle.com/datasets/shibumohapatra/house-price

Jan V D B, Cunningham S A, Eeckels R, et al.Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities [J].PLoS Medicine, 2005, 2(10):e267.DOI:10.1371/journal.pmed.0020267. Heatmap. Data can be visually represented using maps or diagrams, where colors are used to represent different data values.

Gao C.[R] Heatmap, and heatmap.2 gave different figures for the same dataset[J]. [2023-09-17].

O'Keefe J J . The Human Scatterplot. [J]. Mathematics Teaching in the Middle School, 1997, 3.

Cox N J. DISTPLOT: Stata module to generate distribution function plot[J]. Statistical Software Components, 2017.

MSE-Forks acquires Reachable Solutions[J]. Modern Materials Handling: Productivity Solutions for Manufacturing, Warehousing and Distribution, 2013(1):68.

Mohammad B N S, Siddiqui K. Random Forest Regressor Machine Learning Model Developed for Mental Health Prediction Based on Mhi-5, Phq-9 and Bdi Scale [J]. SSRN Electronic Journal, 2021.DOI:10.2139/ssrn.3867416. Random Decision Forests. http://vision.cse.psu.edu/seminars/talks/2009/random_tff/odt.pdf.

Sammut C. Random Decision Forests [J]. 2010.

Mahmoud M A. Phase I Analysis of Multiple Linear Regression Profiles [J]. Communications in Statistics - Simulation and Computation, 2008, 37(10): 2106-2130.DOI:10.1080/03610910802305017.

Downloads

Published

26-01-2024

How to Cite

Liu, J. (2024). Dataset Analysis and House Price Prediction. Highlights in Science, Engineering and Technology, 81, 363-367. https://doi.org/10.54097/1qq43384