Comparative Analysis of Reinforcement Learning Algorithm based on Tennis Environment

Yu Bai; Haoyu Dong; Qiwei Lian

doi:10.54097/hset.v39i.6721

Authors

Yu Bai
Haoyu Dong
Qiwei Lian

DOI:

https://doi.org/10.54097/hset.v39i.6721

Keywords:

PPO; MADDPG; SAC; PyTorch; Tennis Environment.

Abstract

Reinforcement learning and deep reinforcement learning, as a research hotspot in the field of machine learning, have been widely used in our daily life. In this field, game is playing an extremely important role in the developing of reinforcement learning algorithms. Based on the Tennis environment built by Unity ML Agents, this paper used three algorithms, Proximal Policy Optimization (PPO), Multi-Agent Deep Deterministic Policy Gradients (MADDPG) and Soft Actor-Critic (SAC), combined with PyTorch framework, solved the continuous control problem of this environment. Meanwhile, a group of optimal parameters are obtained through multiple trainings, so that Agents can achieve a perfect effect of solving the continuous control problem in Tennis environment. At the end, this paper compared and analyzed the difference among these three algorithms, summarized the application and properties of each algorithm. For different parameters of the algorithm, this paper also maked a comparison and explained the reasons for some special cases as well which can be used for the future work.

Downloads

Download data is not yet available.

References

Li, Y. Deep reinforcement learning: An overview. 2017, arXiv preprint arXiv:1701.07274.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Wierstra, D. Continuous control with deep reinforcement learning. 2015 arXiv preprint arXiv:1509.02971.

Gu, Y., Cheng, Y., Chen, C. P., & Wang, X. Proximal Policy Optimization with Policy Feedback. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 29:434-447.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A Klimov, O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y, Wierstra, D. Continuous control with deep reinforcement learning. 2015, arXiv preprint arXiv:1509.02971.

Wang, Z., Wan, R., Gui, X., Zhou, G. Deep reinforcement learning of cooperative control with four robotic agents by MADDPG. In 2020 International Conference on Computer Engineering and Intelligent Control 2020: 287-290.

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning 2018: 1861-1870.

de Jesus, J. C., Kich, V. A., Kolling, A. H., Grando, R. B., Cuadros, M. A. D. S. L., & Gamarra, D. F. T. Soft actor-critic for navigation of mobile robots. Journal of Intelligent & Robotic Systems, 2021, 102(2), 1-11.

Juliani, A., Berges, V. P., Teng, E., Cohen, A., Harper, J., Elion, C., Lange, D. Unity: A general platform for intelligent agents. 2018, arXiv preprint arXiv:1809.02627.

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Levine, S. Soft actor-critic algorithms, and applications. 2018 arXiv preprint arXiv:1812.05905.