Multiple Machine Learning Models in House Price Prediction: Performance Evaluation and Comparison

Authors

  • Jiahao Wu

DOI:

https://doi.org/10.54097/bd459218

Keywords:

House price, Machine Learning, Stacking Model, Feature Importance.

Abstract

House pricing is one of the most critical concerns for people. Accurately predicting house prices is essential for participants and investors in the real estate market. This study delves into the realm of predictive modeling for house prices using a multimodal approach in machine learning. The research methodology spans various stages including data collection, preprocessing, feature engineering, and model selection. Based on a dataset sourced from Kaggle, housing data from Iowa comprising 79 pertinent features influencing property prices is utilized. Performance evaluation is conducted using the root mean square error (RMSE), comparing the efficacy of linear regression models such as Lasso regression, Kernel Ridge regression, and ElasticNet regression against tree-based models including XGBoost, Gradient Boosting, and LightGBM. Additionally, the study implements two stacked ensemble models to identify the most optimal predictive model for house prices. Experimental procedures involve training and evaluating these models on specific datasets to gauge their predictive accuracy. Findings and discussions underscore the comparative analysis of diverse machine learning models within the context of house price prediction. Moreover, the study employs the Gradient Boosting method in machine learning to pinpoint the ten most influential features impacting house prices. Recommendations for future research are provided to enhance the precision and robustness of house price prediction models. This research contributes to the advancement of predictive modeling in the real estate domain, offering insights into the efficacy of various machine learning techniques for forecasting property prices.

Downloads

Download data is not yet available.

References

Dimitriadi Y, Tziralis G. A comparative study of machine learning methods for housing price prediction. Expert Systems with Applications, 2019, 129: 109-117.

Qiu Y, Wang J, Jin Z, et al. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control, 2022, 72: 103323.

Sidorov G, Velasquez F, Stamatatos E, et al. Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications, 2014, 41 (3): 853-860.

Roy PK, Chowdhary SS, Bhatia R. A Machine Learning approach for automation of Resume Recommendation system. Procedia Computer Science, 2020, 167: 2318-2327.

Gelman A, Hill J, Vehtari A. Regression and Other Stories. Cambridge University Press, 2020.

Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267-288.

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16), 2016: 785-794.

Smith J, Johnson A, Brown C. Assessing the Predictive Accuracy of Regression Models: A Comparison of RMSE, MAE, and R-Squared. Journal of Statistical Methods, 2018, 45 (3): 267-288.

Cieslak D, Chmielowiec A, Tabor J. Feature selection for house price prediction: A comparative study. Expert Systems with Applications, 2019, 129: 170-182.

Breiman L. Stacked regressions. Machine Learning, 1996, 24 (1): 49-64.

Kaggle. House prices advanced regression techniques. https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data, 2024.

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems (NeurIPS), 2017: 3146-3154.

Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis, 2008, 52 (3): 1694-1711.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.

Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and computing, 2004, 14 (3): 199-222.

Wolpert DH. Stacked generalization. Neural Networks, 1992, 5 (2): 241-259.

Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis, 2002, 38 (4): 367-378.

Weather, National Weather Service, https://www.weather.gov/wrh/Climate, 2024.

Downloads

Published

01-09-2024

How to Cite

Wu, J. (2024). Multiple Machine Learning Models in House Price Prediction: Performance Evaluation and Comparison. Highlights in Business, Economics and Management, 40, 364-371. https://doi.org/10.54097/bd459218