Expanding the Horizon: Diverse Applications and Insights from Multi-Armed Bandit Algorithms
DOI:
https://doi.org/10.54097/74ws4250Keywords:
Multi-armed bandit; explore-then-Commit (ETC); upper confidence bound (UCB); Thompson sampling (TS); exploration-exploitation, decision-making.Abstract
In an era marked by the ever-increasing convergence of applications and algorithms, the imperative to maximize efficiency while mitigating the adverse effects of uncertainty has emerged as a critical objective for application developers. Despite this, a significant body of research on Multi-Armed Bandit (MAB) algorithms has predominantly focused on the comparative analysis of their performance, often overlooking their tangible impact on diverse applications. This study builds upon the data and findings of several preceding investigations that examine the application of MAB algorithms across various domains, including e-commerce, clinical trials, and dynamic pricing strategies. Our findings underscore the versatility and adaptability of MAB algorithms in enhancing the performance of applications across these varied fields. Notably, certain MAB algorithms demonstrate a higher suitability for specific scenarios compared to others. Consequently, this research posits that achieving an optimal balance between exploration and exploitation, and thereby maximizing rewards in uncertain environments, necessitates not just the application of MAB algorithms but also the strategic selection of the most effective algorithm tailored to each unique context. Overall, this study aims to showcase the wide-ranging applicability of MAB algorithms, offering a comprehensive exploration of their capabilities and impact across multiple sectors.
Downloads
References
Louëdec, J., Chevalier, M., Mothe, J., Garivier, A., & Gerchinovitz, S. (2015). A Multiple-Play Bandit Algorithm Applied to Recommender Systems. The Florida AI Research Society.
Singh, A. (2021). Reinforcement Learning Based Empirical Comparison of UCB, Epsilon-Greedy, and Thompson Sampling. Int. J. of Aquatic Science, 12(2), 2961-2969.
Nie, G., Agarwal, M., Umrawal, A. K., Aggarwal, V., & Quinn, C. J. (2022, August). An explore-then-commit algorithm for submodular maximization under full-bandit feedback. In Uncertainty in Artificial Intelligence (pp. 1541-1551). PMLR.
Zhang, W., Hu, Z., & Li, G. (2023). Upper confident bound advantage function proximal policy optimization. Cluster Computing, 26(3), 2001-2010.
Ding, Q., Hsieh, C. J., & Sharpnack, J. (2021, March). An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. In International Conference on Artificial Intelligence and Statistics (pp. 1585-1593). PMLR.
West, B., Wang, J., Cui, X., & Huang, J. (2021). Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce. ar**v preprint ar**v: 2108.01440.
Kojima, M. (2022). Application of multi-armed bandits to model-assisted designs for dose-finding clinical trials.
Agarwal, M., Aggarwal, V., & Azizzadenesheli, K. (2022). Multi-agent multi-armed bandits with limited communication. The Journal of Machine Learning Research, 23(1), 9529-9552.
Gan, M., & Kwon, O. C. (2022). A knowledge-enhanced contextual bandit approach for personalized recommendation in dynamic domains. Knowledge-Based Systems, 251, 109158.
Lin, Y., Wang, Y., & Zhou, E. (2023). Risk-averse contextual multi-armed bandit problem with linear payoffs. Journal of Systems Science and Systems Engineering, 32(3), 267-288.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







