TY - GEN
T1 - Machine Learning-Based Feature Mapping for Enhanced Understanding of the Housing Market
AU - Lystbæk, Michael Sahl
AU - Srirajan, Tharsika Pakeerathan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - The housing market is impacted by a variety of parameters which gives a complexity that is difficult to analyze with traditional statistical approaches due to the large number of interdependent variables that the market data provides. In this study, ML techniques are utilized to provide a deeper understanding of the Danish housing market based on a dataset of sales cases provided by a leading Danish real estate agency. We propose an extreme gradient boosting model for sales price regression, and we propose using feature importance techniques to provide insight into important parameters in the national housing market. The regression model trained for sales price with grid search cross-validation for parameter optimization achieves an R2 accuracy of 0.84, an MAE of DKK 433,824, and an RMSE of DKK 675,817. Permutation-based feature importance defines the most impactful parameters for the sales price regression where the four features with the highest impacts are: 1. GisX (West/East location), 2. GisY (North/South location), 3. Building area, 4. Construction year. The results for geographical distribution regarding price, building area, and plot area are illustrated with 2D partial dependence plots of geographical distributions to enhance the understanding of market trends.
AB - The housing market is impacted by a variety of parameters which gives a complexity that is difficult to analyze with traditional statistical approaches due to the large number of interdependent variables that the market data provides. In this study, ML techniques are utilized to provide a deeper understanding of the Danish housing market based on a dataset of sales cases provided by a leading Danish real estate agency. We propose an extreme gradient boosting model for sales price regression, and we propose using feature importance techniques to provide insight into important parameters in the national housing market. The regression model trained for sales price with grid search cross-validation for parameter optimization achieves an R2 accuracy of 0.84, an MAE of DKK 433,824, and an RMSE of DKK 675,817. Permutation-based feature importance defines the most impactful parameters for the sales price regression where the four features with the highest impacts are: 1. GisX (West/East location), 2. GisY (North/South location), 3. Building area, 4. Construction year. The results for geographical distribution regarding price, building area, and plot area are illustrated with 2D partial dependence plots of geographical distributions to enhance the understanding of market trends.
KW - Feature Importance
KW - Gradient Boosting Regression
KW - Housing Market Pricing
KW - Machine Learning
KW - Partial Dependence
UR - http://www.scopus.com/inward/record.url?scp=85198979257&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-62495-7_40
DO - 10.1007/978-3-031-62495-7_40
M3 - Article in proceedings
AN - SCOPUS:85198979257
SN - 978-3-031-62494-0
T3 - Communications in Computer and Information Science
SP - 530
EP - 543
BT - Engineering Applications of Neural Networks
A2 - Iliadis, Lazaros
A2 - Maglogiannis, Ilias
A2 - Papaleonidas, Antonios
A2 - Pimenidis, Elias
A2 - Jayne, Chrisina
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Engineering Applications of Neural Networks, EANN 2024
Y2 - 27 June 2024 through 30 June 2024
ER -