Integrating Multi-Agent Deep Deterministic Policy Gradient and Go-Explore for Enhanced Reward Optimization

Muchen Liu

doi:10.54097/znrt8d63

Authors

Muchen Liu

DOI:

https://doi.org/10.54097/znrt8d63

Keywords:

Reinforcement learning; machine learning; reward optimization

Abstract

The field of Multi-Agent Reinforcement Learning (MARL) continues to advance with the development of new and effective methods. This research is centered on two prominent approaches within this field: Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Go-Explore. The study explores the synergistic potential of combining these two methodologies to enhance rewards for individual agents as well as for agent groups. In the course of this research, MADDPG is introduced into the experimental environment, providing agents with both actor networks (policy networks) and critic networks (Q networks) to implement the actor-critic model. Additionally, each individual agent is equipped with a Go-Explore network, empowering them to conduct deeper explorations of the environment and accumulate rewards at an accelerated rate, often resulting in higher overall rewards. This novel approach emphasizes achieving a balance between individual and collaborative rewards, offering a promising avenue for optimizing multi-agent systems. The results of this study demonstrate that the combined method exhibits notable advantages in certain scenarios. Specifically, it showcases a higher rate of reward accumulation and improved overall performance. This research contributes to the MARL domain by highlighting the potential of combining MADDPG and Go-Explore to enhance the efficiency and effectiveness of multi-agent systems.

Downloads

Download data is not yet available.

References

Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797.

Oroojlooy A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722.

Ecoffet A, Huizinga J, Lehman J, et al. First return, then explore[J]. Nature, 2021, 590(7847): 580-586.

Khoi N D H, Van C P, Tran H V, et al. Multi-Objective Exploration for Proximal Policy Optimization[C]//2020 Applying New Technology in Green Buildings (ATiGB). IEEE, 2021: 105-109.

Lowe R, Wu Y I, Tamar A, et al. multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30.

Palmer G, Tuyls K, Bloembergen D, et al. Lenient multi-agent deep reinforcement learning[J]. arXiv preprint arXiv:1707.04402, 2017.

Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172.

Lowe R, Wu Y I, Tamar A, et al. multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30.

Justesen N, Torrado R R, Bontrager P, et al. Illuminating generalization in deep reinforcement learning through procedural level generation[J]. arXiv preprint arXiv:1806.10729, 2018.

Terry J, Black B, Grammel N, et al. Pettingzoo: Gym for multi-agent reinforcement learning[J]. Advances in Neural Information Processing Systems, 2021, 34: 15032-15043.

Zhou Y, Liu S, Qing Y, et al. Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL? [J]. arXiv preprint arXiv:2305.17352, 2023.

Lee Y, Kim G, Nam C. Semi-Decentralized Control of Multi-Robot System for Autonomous Navigation via Multi-Agent Reinforcement Learning[J]. 2023.