A Analytical and Practical Insights into Multi-Armed Bandit Problems in Recommendation Systems

Maike Feng

doi:10.54097/tg7bpm76

Authors

Maike Feng

DOI:

https://doi.org/10.54097/tg7bpm76

Keywords:

Multi-Armed Bandit problem; Recommendation Systems.

Abstract

This paper delves into the application of the Multi-Armed Bandit (MAB) algorithm in recommendation systems, a tool increasingly prevalent across diverse sectors such as e-commerce, social networks, and news platforms. The primary objective of these systems is to curate content that resonates with user preferences, thereby enhancing user engagement and augmenting business revenue. Central to the optimization of these recommendation strategies is the careful balance between exploration - the pursuit of new, potentially relevant options - and exploitation - the utilization of known, popular choices. The MAB algorithm, an online learning method, adeptly navigates this balance. This study presents a detailed exploration of the MAB algorithm's theoretical underpinnings and its practical applications in recommendation systems. We implement these concepts using real-world datasets to assess their efficacy in such systems. The paper concludes by examining the benefits and constraints of employing MAB algorithms in recommendation contexts and proposes avenues for future research. This analysis aims to contribute to the ongoing evolution of recommendation systems, underscoring the pivotal role of MAB algorithms in their advancement.

Downloads

Download data is not yet available.

References

Ricci, F., Rokach, L., & Shapira, B. (2010). Introduction to recommender systems handbook. In Recommender systems handbook (pp. 1-35). Boston, MA: springer US.

Zhao, X., **a, L., Tang, J., & Yin, D. (2019). “Deep reinforcement learning for search, recommendation, and online advertising: a survey" by **angyu Zhao, Long **a, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM sigweb newsletter, 2019(Spring), 1-15.

Varaiya, P., & Walrand, J. C. (1983, September). Multi-armed bandit problems and resource sharing systems. In Proceedings of the International Workshop on Computer Performance and Reliability (pp. 181-196).

Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47, 235-256.

Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov), 397-422.

Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. Advances in neural information processing systems, 24.

Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4), 285-294.

Agrawal, S., & Goyal, N. (2012, June). Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory (pp. 39-1). JMLR Workshop and Conference Proceedings.

Zhu, X., Xu, H., Zhao, Z., & others. (2021). an Environmental Intrusion Detection Technology Based on WiFi. Wireless Personal Communications, 119(2), 1425-1436.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Yan, C., Han, H., Zhang, Y., Zhu, D., & Wan, Y. (2022). Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation. Knowledge-Based Systems, 257, 109927.

Silva, N., Silva, T., Werneck, H., Rocha, L., & Pereira, A. (2023). User cold-start problem in multi-armed bandits: When the first recommendations guide the user’s experience. ACM Transactions on Recommender Systems, 1(1), 1-24.

Zhou, T., Wang, Y., Yan, L., & Tan, Y. (2023). Spoiled for Choice? Personalized Recommendation for Healthcare Decisions: A Multiarmed Bandit Approach. Information Systems Research.