Balancing Exploration and Exploitation: An Analytical Review of Three Classical Bandit Algorithms
DOI:
https://doi.org/10.54097/bdsjtc03Keywords:
MAB problem, ETC algorithm, UCB algorithm, TS algorithm.Abstract
In daily lives, numerous decision-making challenges can be framed as multi-armed bandit problems, where people must strategically balance between exploration and exploitation. This paper focuses on three prominent algorithms developed to tackle this dilemma: the Explore-then-commit (ETC) algorithm, the Upper Confidence Bound (UCB) algorithm, and Thompson Sampling (TS) algorithm. The paper provides a detailed examination of their core idea and a critical analysis of their respective advantages and disadvantages. Furthermore, this paper highlights how their inherent characteristics lead to different performance profiles in varied contexts and contextualize their differences by proposing targeted real-world application scenarios for each algorithm. This paper not only concludes key insights for researchers but also offers an accessible entry point for beginners in the reinforcement learning and data science. By elucidating the essence behind each method and their practical implications, this paper equips newcomers with the foundational knowledge necessary to understand and apply these powerful tools in real-world problems.
Downloads
References
[1] EC Elumar, C Tekin, O Yagan. Multi-Armed Bandits with Probing. 2024 IEEE International Symposium on Information Theory. 2024.
[2] E. M. Schwartz, E. T. Bradlow, and P. S. Fader, “Customer acquisition via display advertising using multi-armed bandit experiments,” Marketing Science, vol. 36, no. 4, pp. 500 – 522, 2017.
[3] Y. Varatharajah and B. Berry, “A contextual-bandit-based approach for informed decision-making in clinical trials,” Life, vol. 12, no. 8, 2022.
[4] N. Silva, H. Werneck, T. Silva, A.C. Pereira, A.C. Pereira, and L. Rocha,” Multi-armed-bandits in recommendation systems: A survey of the state-of-the-art and future directions,” Expert Systems with Applications, vol.197, p.116669, 2022.
[5] Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yagan. Multi-Armed Bandits with Correlated Arms. IEEE transactions on information theory, vol.67, NO.10, October 2021.
[6] Aurelien Garivier, Emilie Kaufmann, Tor Lattimore. On Explore-Then-Commit Strategies. 30th Conference on Neural Information Processing Systems (NIPS, 2016), Barcelona, Spain.
[7] Samarth Gupta, Gauri Joshi, Osman Yagan. Exploiting Correlation in Finite-Armed Structured Bandits.
[8] Eray Can Elumar, Student Member, IEEE, Cem Tekin, Senior Member, IEEE, and Osman Yagan, Senior Member, IEEE. Multi-Armed Bandits with Costly Probes. IEEE transactions on information theory, vol. 71, NO.1, January 2025.
[9] Aurelien Garivier, Hedi Hadiji, Pierre Menard, Gilles Stoltz. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints.
[10] Samarth Gupta, Shreyas Chaudhari, Member, IEEE, Subh Jyoti Mukherjee, Gauri Joshi, and Osman Yagan, Senior Member, IEEE. A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting. IEEE journal on selected areas in information theory, vol.1, NO.3, November 2020.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Academic Journal of Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.








