Analysis and Forecasting of California Housing
DOI:
https://doi.org/10.54097/hbem.v3i.4704Keywords:
California Housing; K-Fold Method; Random Forest.Abstract
House prices have significant impact on people’s daily life, and it is essential for people to have fixed abode, to live, work and social prosperity and stability. Hence predicting House price is a meaningful and big challenge. To achieve this goal, we use California Census dataset in this project to how distinctive features (attributes) can make the house price higher or lower. The main idea of this project is to build a Regression Model that can learn from this data and make predictions of the price of a house in any block, given some useful features provided in the datasets. In the regression task, we applied cross-validation and K-Fold method on Ridege Model, Random Forest, Gradient Boosting models to select the optimal hyperparameters. Then we apply the best selected model on test set, the results show decent performance for Random Forest and Gradient Boosting. The Random Forest performs the best with MSE (Mean Squared Error) 0.290, while it takes training time 14.7 seconds. Although the Gradient Boosting takes the result of MSE is 0.295, it took a shorter training time (2.91s).
Downloads
References
[All74] David M Allen. The relationship between variable selection and data augmentation and a method for prediction. techno metrics, 16(1):125–127, 1974.
[BD15] Peter J Bickel and Kjell A Doksum. Mathematical statistics: basic ideas and selected topics, volumes I-II package. Chapman and Hall/CRC, 2015.
[GW68] Ramanathan Gnanadesikan and Martin B Wilk. Probability plotting methods for the analysis of data. Biometrika, 55(1):1–17, 1968.
[Ho95] Tin Kam Ho. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE, 1995.
[HS77] Donald E Hilt and Donald W Seegrist. Ridge, a computer program for calculating ridge regression estimates. Department of Agriculture, Forest Service, Northeastern Forest Ex- periment, 1977.
[HTF09] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Boosting and additive trees. In The elements of statistical learning, pages 337–387. Springer, 2009.
[JWHT13] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning, volume 112. Springer, 2013.
[Li15] Yuming Li. The asymmetric house price dynamics: Evidence from the california market. Regional Science and Urban Economics, 52:1–12, 2015.
[PB97] R Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297, 1997.
[SBD+22] Saptarsi Sanyal, Saroj Kumar Biswas, Dolly Das, Manomita Chakraborty, and Biswajit Purkayastha. Boston house price prediction using regression models. In 2022 2nd Interna- tional Conference on Intelligent Technologies (CONIT), pages 1–6. IEEE, 2022.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






