Hierarchical & Shrinkage Thompson Sampling for Stable Off-Policy Evaluation On MIMIC-IV

Authors

  • Qianruo Tan

DOI:

https://doi.org/10.54097/wg8eta78

Keywords:

Thompson Sampling, Hierarchical Bayes, Shrinkage prior, ε-mix shared log, Off-Policy Evaluation.

Abstract

Trial-and-error is costly in intensive care. Clinicians face high-stakes, noisy decisions where experimentation is risky. We study how to make Thompson Sampling (TS) reliably evaluable offline before deployment. We introduce two variants—Hierarchical Empirical-Bayes TS and Hierarchical Shrinkage TS—that share information by using log across arms to stabilize exploration, especially in early stage. To enable fair and high-power improvement, we adopt an ε-mix shared-log design that improves action overlap while keeping a fixed log for all target policies. Off-policy evaluation uses Inverse Propensity Scoring with weight diagnostics, with effective sample size, upper quantiles, max, and paired bootstrap over seeds for uncertainty. On synthetic data and MIMIC-IV with 6-hour grid setting. Our pipeline yields small but consistent gains over standard TS together with healthy weight distributions and large effective sample sizes. Without ε-mix, early exploration is imbalanced and IPS becomes unstable. Overall, coverage-aware logging plus hierarchical and shrinkage priors provides a reproducible pathway to assess TS policies safely and credibly for ICU decision support.

Downloads

Download data is not yet available.

References

[1] Varatharajah, Y., Berry, B.A.: A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials. Life 12, 1277 (2022).

[2] Peringa, I.P., Cox, E.G.M., Wiersema, R., van der Horst, I.C.C., Meijer, R.R., Koeze, J.: Human judgment error in the intensive care unit: a perspective on bias and noise. Critical Care 29, Article 86 (2025).

[3] Chapelle, O., Li, L.: An Empirical Evaluation of Thompson Sampling. Yahoo! Research Technical Report (2011).

[4] Agrawal, S., Goyal, N.: Near-Optimal Regret Bounds for Thompson Sampling. Journal of the ACM 64(5), Article 30 (2017). https://doi.org/10.1145/3088510

[5] Bayesian Off-Policy Evaluation and Learning for Large Action Spaces. arXiv preprint (2024).

[6] Rome, S., Chen, T., Kreisel, M., Zhou, D., et al.: Lessons on off-policy methods from a notification component of a chatbot. Machine Learning 110, 2577–2602 (2021).

[7] Howell, N., Neamatullah, I., Lin, K., Kuo, T.-T., et al.: MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. In: Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI 2020). IEEE (2020).

[8] Johnson, A.E.W., Bulgarelli, L., Pollard, T.J., Gow, B., Moody, B., Horng, S., Celi, L.A., Mark, R.G.: MIMIC-IV (version 3.1). PhysioNet (2024). RRID:SCR_007345. https://doi.org/10.13026/kpb9-mt58

[9] Xu, J., Cai, H., Zheng, X.: Timing of vasopressin initiation and mortality in patients with septic shock: analysis of the MIMIC-III and MIMIC-IV databases. BMC Infectious Diseases 23, 199 (2023). https://doi.org/10.1186/s12879-023-08147-6

[10] Cao, J., et al.: Generalizability of an acute kidney injury prediction model across health systems. Nature Machine Intelligence (2022). (author manuscript)

[11] Komorowski, M., Celi, L.A., Badawi, O., Gordon, A.C., Faisal, A.A.: The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24(11), 1716–1720 (2018). https://doi.org/10.1038/s41591-018-0213-5

[12] Agrawal, S., Goyal, N.: Thompson Sampling for Contextual Bandits with Linear Payoffs. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 127–135 (2013).

Downloads

Published

13-03-2026

Issue

Section

Articles

How to Cite

Tan, Q. (2026). Hierarchical & Shrinkage Thompson Sampling for Stable Off-Policy Evaluation On MIMIC-IV. Academic Journal of Science and Technology, 19(3), 260-268. https://doi.org/10.54097/wg8eta78