Comprehensive Introduction and Analysis of the UCB Algorithm in Multi-Armed Bandit Problems

Kaizhuo Hu

doi:10.54097/wm3zkc73

Authors

Kaizhuo Hu

DOI:

https://doi.org/10.54097/wm3zkc73

Keywords:

Multiple arms; UCB algorithm; Exploration; Exploitation; Confidence interval.

Abstract

This article commences with a review of the fundamentals of reinforcement learning, encompassing a detailed overview of its key models and algorithms. It then juxtaposes these concepts with traditional machine learning paradigms, offering a comparative analysis. Subsequently, the focus shifts to the background and developmental landscape of recommendation systems. It systematically categorizes and describes the commonly employed recommendation algorithms. The core of the discussion centers on the utilization of reinforcement learning in recommendation systems. Beginning with practical case studies, the article delves into the strategies for integrating reinforcement learning with recommendation systems, addressing current challenges and envisaging future directions for development. Insightful reflections and comparisons between reinforcement learning and traditional machine learning are also provided, elucidating the differences and applicable scenarios for each approach. In essence, this article serves as an extensive guide to the intricacies of reinforcement learning and recommendation systems. It aims to equip readers with the knowledge required to understand and effectively apply these technologies. As technological advancement and research progress, it is anticipated that reinforcement learning will increasingly infiltrate the operations of recommendation systems, offering more personalized services and enhanced user experiences.

Downloads

Download data is not yet available.

References

Russo, D., & Van Roy, B. (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4), 1221-1243.

Cappé, O., Garivier, A., Maillard, O. A., Munos, R., & Stoltz, G. (2013). Kullback-Leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, 1516-1541.

Kalvit, A., & Zeevi, A. (2021). A closer look at the worst-case behavior of multi-armed bandit algorithms. Advances in Neural Information Processing Systems, 34, 8807-8819.

Gupta, S., Chaudhari, S., Joshi, G., & Yağan, O. (2021). Multi-armed bandits with correlated arms. IEEE Transactions on Information Theory, 67(10), 6711-6732.

Zhu, X., Huang, Y., Wang, X., & Wang, R. (2023). Emotion recognition based on brain-like multimodal hierarchical perception. Multimedia Tools and Applications, 1-19.

Bayati, M., Hamidi, N., Johari, R., & Khosravi, K. (2020). Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Advances in Neural Information Processing Systems, 33, 1713-1723.

Silva, N., Werneck, H., Silva, T., Pereira, A. C., & Rocha, L. (2022). Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions. Expert Systems with Applications, 197, 116669.

Wang, S., Huang, L., & Lui, J. (2020). Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Advances in Neural Information Processing Systems, 33, 11878-11889.

Liu, X., Derakhshani, M., Lambotharan, S., & Van der Schaar, M. (2020). Risk-aware multi-armed bandits with refined upper confidence bounds. IEEE Signal Processing Letters, 28, 269-273.

Jia, H., Shi, C., & Shen, S. (2021). Multi-armed bandit with sub-exponential rewards. Operations Research Letters, 49(5), 728-733.

Comprehensive Introduction and Analysis of the UCB Algorithm in Multi-Armed Bandit Problems

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications