Research on Multi-agent Sparse Reward Problem
DOI:
https://doi.org/10.54097/er0mx710Keywords:
Sparse reward; manually annotated; hierarchical reinforcement learning; intrinsic rewardAbstract
Sparse reward poses a significant challenge in deep reinforcement learning, leading to issues such as low sample utilization, slow agent convergence, and subpar performance of optimal policies. Overcoming these challenges requires tackling the complexity of sparse reward algorithms and addressing the lack of unified understanding. This paper aims to address these issues by introducing the concepts of reinforcement learning and sparse reward, as well as presenting three categories of sparse reward algorithms. Furthermore, the paper conducts an analysis and summary of three key aspects: manual labeling, hierarchical reinforcement learning, and the incorporation of intrinsic rewards. Hierarchical reinforcement learning is further divided into option-based and subgoal-based methods. The implementation principles, advantages, and disadvantages of all algorithms are thoroughly examined. In conclusion, this paper provides a comprehensive review and offers future directions for research in this field.
Downloads
References
Zhao Ying. Research on intrinsic rewards for reinforcement learning [D]. Guizhou University, 2022.
Yin Jia. Research and implementation of sparse reward task solving method in reinforcement learning [D]. University of Electronic Science and Technology of China, 2022.
Du Wei, Ding Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(08): 1-8.
Ma Yun-ting. Research on reward mechanism of multi-agent reinforcement learning [D]. Hefei University of Technology, 2021.
Yang Wei-yi, Bai Chen-jia, Cai Chao, Zhao Ying-nan, Liu Peng, et al. Survey on Sparse Reward in Deep Reinforcement Learning [J]. Computer Science, 2020, 47(03): 182-191.
Ng A. Y, Harada D, Russell S. Policy invariance under reward transformations: Theory and application to reward shaping [C]. Icml, 1999: 278-287.
Harutyunyan A, Devlin S, Vrancx P, et al. Expressing arbitrary reward functions as potential-based advice [C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2015: 2652- 2658.
Demir A, Çilden E, Polat F. Landmark based reward shaping in reinforcement learning with hidden states [C]. Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems, 2019: 1922-1924.
Bacon P L, Harb J, Precup D. The option-critic architecture [C] Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017.
Gregor K, Rezende DJ, Wierstra D. Variational intrinsic control. arXiv, 2016: 1611.07507.
Vezhnevets A S, Osindero S, Schaul T, et al. Feudal networks for hierarchical reinforcement learning [C] Proceedings of the 34th International Conference on Machine Learning-Volume 70. Sydney, Australia, 2017: 3540−3549.
Levy A, Konidaris G, Platt R, et al. Learning multi-level hierarchies with hindsight [J]. arXiv preprint arXiv, 2017: 1712.00948.
Huang ZG, Liu Q, Zhang LH, Cao JQ, Zhu F. Research and Development on Deep Hierarchical Reinforcement Learning. Ruan Jian Xue Bao/ Journal of Software, 2023, 34(2): 733 760 (in Chinese).
Houthooft R, Chen X, Duan Y, et al. Vime: Variational information maximizing exploration [C] Advances in Neural Information Processing Systems. 2016: 1109-1117.
Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction [C] International conference on machine learning. PMLR, 2017: 2778-2787.
Burda Y., Edwards H., Storkey A, Klimov O. Exploration by random network distillation [J]. arXiv preprint arXiv, 2018: 1810.12894.
Fu J, Co-Reyes J, Levine S. Ex2: Exploration with exemplar models for deep reinforcement learning [C] Advances in Neural Information Processing Systems. 2017: 2577-2587.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







