Hierarchical & Shrinkage Thompson Sampling for Stable Off-Policy Evaluation On MIMIC-IV
DOI:
https://doi.org/10.54097/wg8eta78Keywords:
Thompson Sampling, Hierarchical Bayes, Shrinkage prior, ε-mix shared log, Off-Policy Evaluation.Abstract
Trial-and-error is costly in intensive care. Clinicians face high-stakes, noisy decisions where experimentation is risky. We study how to make Thompson Sampling (TS) reliably evaluable offline before deployment. We introduce two variants—Hierarchical Empirical-Bayes TS and Hierarchical Shrinkage TS—that share information by using log across arms to stabilize exploration, especially in early stage. To enable fair and high-power improvement, we adopt an ε-mix shared-log design that improves action overlap while keeping a fixed log for all target policies. Off-policy evaluation uses Inverse Propensity Scoring with weight diagnostics, with effective sample size, upper quantiles, max, and paired bootstrap over seeds for uncertainty. On synthetic data and MIMIC-IV with 6-hour grid setting. Our pipeline yields small but consistent gains over standard TS together with healthy weight distributions and large effective sample sizes. Without ε-mix, early exploration is imbalanced and IPS becomes unstable. Overall, coverage-aware logging plus hierarchical and shrinkage priors provides a reproducible pathway to assess TS policies safely and credibly for ICU decision support.
Downloads
References
[1] Varatharajah, Y., Berry, B.A.: A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials. Life 12, 1277 (2022).
[2] Peringa, I.P., Cox, E.G.M., Wiersema, R., van der Horst, I.C.C., Meijer, R.R., Koeze, J.: Human judgment error in the intensive care unit: a perspective on bias and noise. Critical Care 29, Article 86 (2025).
[3] Chapelle, O., Li, L.: An Empirical Evaluation of Thompson Sampling. Yahoo! Research Technical Report (2011).
[4] Agrawal, S., Goyal, N.: Near-Optimal Regret Bounds for Thompson Sampling. Journal of the ACM 64(5), Article 30 (2017). https://doi.org/10.1145/3088510
[5] Bayesian Off-Policy Evaluation and Learning for Large Action Spaces. arXiv preprint (2024).
[6] Rome, S., Chen, T., Kreisel, M., Zhou, D., et al.: Lessons on off-policy methods from a notification component of a chatbot. Machine Learning 110, 2577–2602 (2021).
[7] Howell, N., Neamatullah, I., Lin, K., Kuo, T.-T., et al.: MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. In: Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI 2020). IEEE (2020).
[8] Johnson, A.E.W., Bulgarelli, L., Pollard, T.J., Gow, B., Moody, B., Horng, S., Celi, L.A., Mark, R.G.: MIMIC-IV (version 3.1). PhysioNet (2024). RRID:SCR_007345. https://doi.org/10.13026/kpb9-mt58
[9] Xu, J., Cai, H., Zheng, X.: Timing of vasopressin initiation and mortality in patients with septic shock: analysis of the MIMIC-III and MIMIC-IV databases. BMC Infectious Diseases 23, 199 (2023). https://doi.org/10.1186/s12879-023-08147-6
[10] Cao, J., et al.: Generalizability of an acute kidney injury prediction model across health systems. Nature Machine Intelligence (2022). (author manuscript)
[11] Komorowski, M., Celi, L.A., Badawi, O., Gordon, A.C., Faisal, A.A.: The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24(11), 1716–1720 (2018). https://doi.org/10.1038/s41591-018-0213-5
[12] Agrawal, S., Goyal, N.: Thompson Sampling for Contextual Bandits with Linear Payoffs. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 127–135 (2013).
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Academic Journal of Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.








