Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios

Authors

  • Chang Qu

DOI:

https://doi.org/10.54097/8mb1rb15

Keywords:

Multi-Armed Bandit Problem; Reinforcement Learning; Upper Confidence Bound; UCB-tuned.

Abstract

This paper delves into the complexities of the Multi-Armed Bandit (MAB) problem, a fundamental concept in reinforcement learning and probability theory, with a focus on its application in recommendation systems and dynamic fields such as dynamic pricing and investment. It begins by shedding light on the essential paradox at the heart of the MAB problem – the balance between exploration and exploitation within limited parameters. The study primarily centers on Upper Confidence Bound (UCB) policies, especially UCB-tuned and Asymptotically Optimal UCB, noted for their adept balance between exploration and utilization. The novel contribution of this research is the enhancement of these UCB policies via an innovative weighted average method, leading to the development of WA-UCB-tuned and WA Asymptotically Optimal UCB algorithms. The research rigorously compares these optimized iterations with traditional UCB1, UCB-tuned, and Asymptotically Optimal UCB across varied MAB models featuring different numbers of arms. This study provides an exhaustive introduction to the MAB problem and pertinent UCB policies, the methodology behind the weighted average optimization, extensive experimental analysis, and comprehensive evaluations of the findings. The results showcase marked improvements in algorithmic performance, suggesting significant advancements in the domain of recommendation systems and other applications of the MAB problem.

Downloads

Download data is not yet available.

References

Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47, 235-256.

Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4), 285-294.

Kaufmann, E., Cappé, O., & Garivier, A. (2012, March). On Bayesian upper confidence bounds for bandit problems. In Artificial intelligence and statistics (pp. 592-600). PMLR.

Silva, N., Werneck, H., Silva, T., Pereira, A. C., & Rocha, L. (2022). Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions. Expert Systems with Applications, 197, 116669.

Manome, N., Shinohara, S., & Chung, U. I. (2023). Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages. ar**v preprint ar**v:2308.14350.

Harper, F. M., & Konstan, J. A. The movielens datasets: History and context, Acm transactions on interactive intelligent systems (tiis), 5 (2016). Cited on, 59.

Komiyama, J. (2016). Asymptotically Optimal Multi-armed Bandit Algorithms Aimed at Online.

Amirizadeh, K., & Rajeswari, M. (2015). Accelerated-Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications. J. Softw., 10(3), 239-249.

Gonçalves, R. A., Almeida, C. P., & Pozo, A. (2015). Upper confidence bound (UCB) algorithms for adaptive operator selection in MOEA/D. In Evolutionary Multi-Criterion Optimization: 8th International Conference, EMO 2015, Guimarães, Portugal, March 29--April 1, 2015. Proceedings, Part I 8 (pp. 411-425). Springer International Publishing.

Liu, Y. E., Mandel, T., Brunskill, E., & Popovic, Z. (2014, July). Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits. In EDM (pp. 161-168).

Downloads

Published

26-04-2024

How to Cite

Qu, C. (2024). Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios. Highlights in Science, Engineering and Technology, 94, 187-194. https://doi.org/10.54097/8mb1rb15