Three-stage Path Planning for Warehouse Robots Based on Q-learning Algorithm
DOI:
https://doi.org/10.54097/g149sn60Keywords:
Q-learning; Warehouse; Robot; Path planning.Abstract
With the rapid growth of e-commerce, warehouse automation has become an important area to improve efficiency and reduce operational costs. Warehouse robots have been widely used by online merchants to automate the delivery process of warehouse logistics. However, traditional research lacks clear guidance on the optimal setting of parameters for Q-learning in a warehouse environment. And it does not consider path planning for multi-stage operations. This research examines the application of the Q-learning algorithm for three-stage path planning in robotic systems inside a warehouse setting. The research optimized the convergence rate of Q-learning in a warehouse by comparing the efficiency and stability of the Q-learning algorithm under different parameters. This research provides a three-stage warehouse transportation path planning strategy. This strategy is accomplished through a three-stage “inbound-shipping-return” path planning. It enables the robot to navigate efficiently through the warehouse layout, reducing transportation time and avoiding obstacles. This research demonstrates the effectiveness of Q-learning in warehouse path planning and shows that warehouse robots have great potential to save labor costs and improve the efficiency of warehouse transportation tasks. The Q-learning-based optimization method provides a solution for the automation of warehouse transportation tasks.
Downloads
References
[1] Zhang H, Lin W, Chen A. Path planning for the mobile robot: A review. Symmetry, 2018, 10(10): 450. DOI: https://doi.org/10.3390/sym10100450
[2] Adzhar N, Yusof Y, Ahmad M A. A review on autonomous mobile robot path planning algorithms. Advances in Science, Technology and Engineering Systems Journal, 2020, 5(3): 236-240. DOI: https://doi.org/10.25046/aj050330
[3] Peyas I S, Hasan Z, Tushar M R R, et al. Autonomous warehouse robot using deep Q-learning. TENCON 2021-2021 IEEE Region 10 Conference (TENCON). IEEE, 2021: 857-862. DOI: https://doi.org/10.1109/TENCON54134.2021.9707256
[4] Rimélé A, Grangier P, and Gamache M, et al. E-commerce warehousing: learning a storage policy. arXiv preprint arXiv:2101.08828, 2021.
[5] Sutton R S. Reinforcement learning: An introduction. A Bradford Book, 2018.
[6] Phan M Q, Azad S M B. Input-decoupled Q-learning for optimal control. The Journal of the Astronautical Sciences, 2020, 67(2): 630-656. DOI: https://doi.org/10.1007/s40295-019-00157-4
[7] Watkins C J C H, Dayan P. Q-learning. Machine learning, 1992, 8: 279-292. DOI: https://doi.org/10.1023/A:1022676722315
[8] Wagenbach J, Sabatelli M. Factors of influence of the overestimation bias of Q-learning. arXiv preprint arXiv:2210.05262, 2022.
[9] Van Seijen H, Fatemi M, Tavakoli A. Using a logarithmic mapping to enable lower discount factors in reinforcement learning. Advances in Neural Information Processing Systems, 2019, 32.
[10] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning. PMLR, 2018: 1861-1870.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







