Optimizing Movie Recommendation Systems with Multi-Armed Bandit Algorithms
DOI:
https://doi.org/10.54097/wszgrm26Keywords:
Multi-armed bandit algorithms; Recommendation system; Reinforcement learning.Abstract
In the current digital era, with the proliferation of online content, users are often overwhelmed by an abundance of digital advertisements, a consequence of the lower costs associated with digital channels compared to traditional media. Amidst this information deluge, recommendation systems have become pivotal in offering tailored suggestions to users. The Multi-Armed Bandit (MAB) problem, rooted in reinforcement learning, presents an effective approach for addressing the exploration-exploitation trade-offs inherent in recommendation systems. This paper specifically examines the implementation of MAB algorithms within the context of movie recommendation systems. A thorough literature review is conducted, focusing on the synergy between reinforcement learning and MAB algorithms, their role in recommendation systems, and an overview of fundamental MAB algorithms. To assess the efficacy of these algorithms, an experimental approach is designed, leveraging real-world datasets. This approach encompasses the formulation of the problem, data collection methods, selection of hyperparameters, and analysis of empirical results. The findings from these experiments indicate that Thompson Sampling stands out as the most effective MAB algorithm in the context of the studied datasets. The results of this study suggest that MAB algorithms have considerable potential to enhance the effectiveness of movie recommendation systems. Furthermore, the methodology and framework proposed in this paper offer valuable insights for integrating and combining multiple algorithms in complex decision-making scenarios.
Downloads
References
Mambou, E. N., & Woungang, I. (2023). Bandit Algorithms Applied in Online Advertisement to Evaluate Click-Through Rates. In 2023 IEEE AFRICON (pp. 1-5). IEEE.
Lattimore, T., & Szepesvári, C. (2020). Bandit Algorithms. Cambridge University Press.
Slivkins, A. (2022). Introduction to Multi-Armed Bandits.
Sondik, E. J. (1978). The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2), 282-304.
Nishimura, Y. Ad Recommender System Analysis by the Multi-Armed Bandit Problem.
Zhou, Q., Zhang, X., Xu, J., & Liang, B. (2017). Large-scale bandit approaches for recommender systems. In International Conference on Neural Information Processing (pp. 811-821). Springer.
Bouneffouf, D., Laroche, R., Urvoy, T., Féraud, R., & Allesiardo, R. (2014). Contextual bandit for active learning: Active Thompson sampling. In International Conference on Neural Information Processing (pp. 405-412). Springer.
Muqattash, I., & Hu, J. (2023). A ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes. Stats, 6, 99-112.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4), 285-294.
Elena, G., Milos, K., & Eugene, I. (2021). Survey of multiarmed bandit algorithms applied to recommendation systems. International Journal of Open Information Technologies, 9(4), 12-27.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







