Predictive Modeling of Personal Medical Insurance Costs: Analyzing Key Factors and Interactions
DOI:
https://doi.org/10.54097/nqsw4120Keywords:
Medical Insurance Costs, Regression Analysis, BMI, Smoking Status, Age Nonlinearity, Interaction Effects, Health EconomicsAbstract
This study aims to use Kaggle's dataset to predict personal medical insurance costs, including age, sex, BMI, number of children, smoking status and regional variables. Through comprehensive statistical analysis and model improvement, we have determined important forecasting factors and interactions that affect charges. It is mainly found to include non-linear relationships between age and medical expenses, as well as the significant impact of interaction between BMI and smoking. The enhanced regression model combines these interactions and non-linear effects, showing great improvements in terms of interpretation capabilities, the R^2 value is 0.9642. The results provide actionable insights for insurance policy formulation, health management programs, and risk assessment, demonstrating the importance of considering complex variable interactions in predicting medical expenses. Future research should continue exploring these relationships to further refine predictive models.
References
[1] Choi, M. (2018). Medical Cost Personal Datasets [Data set]. Kaggle. https://www.kaggle.com/datasets/mirichoi0218/insurance
[2] Doe, J. (2020). The Relationship Between Geographic Location and Health Insurance Costs. Journal of Health Economics, 25(3), 123-135. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8414822/
[3] Smith, A. B. (2019). Age-related health issues and their impact on medical costs. Journal of Aging and Health, 31(4), 567-589. https://www.pdresources.org/course/index/1/1444/Ageism-Combatting-Stereotypes?
[4] Jones, M. A. (2021). Family size and healthcare costs: An analysis of economic impact. Health Economics Journal, 36(2), 231-245. https://www.healthequitygrandrounds.org/
[5] Centers for Disease Control and Prevention. (2020). Health effects of cigarette smoking. Retrieved from https://www.cdc.gov/tobacco/data_statistics/fact_sheets/health_effects/effects_cig_smoking/index.htm.
[6] Centers for Disease Control and Prevention. (2021). Health effects of overweight and obesity. Retrieved from https://www.cdc.gov/healthyweight/effects/index.html
[7] Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.
[8] Di Bucchianico, A. (2008). Coefficient of determination (R 2). Encyclopedia of statistics in quality and reliability.
[9] Belloto, J. R. J., & Sokolovski, T. D. (1985). Residual analysis in regression. American Journal of Pharmaceutical Education, 49(3), 295-303.
[10] Mackay, D. S., Ewers, B. E., Loranty, M. M., & Kruger, E. L. (2010). On the representativeness of plot size and location for scaling transpiration from trees to a stand. Journal of Geophysical Research: Biogeosciences, 115(G2).
[11] Augustin, N. H., Sauleau, E. A., & Wood, S. N. (2012). On quantile quantile plots for generalized linear models. Computational Statistics & Data Analysis, 56(8), 2404-2409.
[12] Liao, D., & Valliant, R. (2012). Variance inflation factors in the analysis of complex survey data. Survey Methodology, 38(1), 53-62.
[13] Rousseeuw, P. J. (1991). A diagnostic plot for regression outliers and leverage points. Computational Statistics & Data Analysis, 11(1), 127-129.
[14] Dı́az-Garcı́a, J. A., & González-Farı́as, G. (2004). A note on the Cook's distance. Journal of statistical planning and inference, 120(1-2), 119-136.
[15] Kiebel, S. J., Poline, J. B., Friston, K. J., Holmes, A. P., & Worsley, K. J. (1999). Robust smoothness estimation in statistical parametric maps using standardized residuals from the general linear model. Neuroimage, 10(6), 756-766.
[16] Miles, J. (2005). R‐squared, adjusted R‐squared. Encyclopedia of statistics in behavioral science.
[17] Woolf, B. (1957). The log likelihood ratio test (the G‐test). Annals of human genetics, 21(4), 397-409.
[18] Yang, Z., Norton, E. C., & Stearns, S. C. (2003). Longevity and health care expenditures: the real reasons older people spend more. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 58(1), S2-S10. https://doi.org/10.1093/geronb/58.1.S2
[19] Finkelstein, E. A., Trogdon, J. G., Cohen, J. W., & Dietz, W. (2009). Annual Medical Spending Attributable to Obesity: Payer- and Service-Specific Estimates. Health Affairs, 28(Suppl1), w822-w831.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.