PF-LSTM Reinforcement Learning Enhanced Hybrid Acoustic-Optical Adaptive Collaborative AUV Localization Algorithm

Qiankun Fu

doi:10.54097/cqsgkp19

Authors

Qiankun Fu

DOI:

https://doi.org/10.54097/cqsgkp19

Keywords:

Underwater AUV; Collaborative Localization; PF-LSTM Reinforcement Learning; Hybrid Acoustic-Optical; Clock Asynchrony; Dense Reward; TDOA-RSS; Energy Optimization; Dynamic Switching

Abstract

Cooperative localization of autonomous underwater vehicles (AUVs) is widely used in fields such as ocean exploration and environmental monitoring. However, its effectiveness highly depends on precise position estimation and clock synchronization mechanisms. Clock offset and drift, signal multipath attenuation, and dynamic interference in the underwater medium significantly constrain localization accuracy in anchor-free environments. Although existing cooperative algorithms have proposed solutions like TDOA/TDOC to address asynchrony, they still face challenges such as error accumulation, slow convergence, and energy consumption imbalance. To this end, this paper proposes a Potential Field-LSTM reinforced hybrid acoustic-optical adaptive AUV cooperative localization algorithm (PF-LSTM-QHAACL). The algorithm introduces a reinforcement learning decision framework incorporating Potential Field-based dense reward and LSTM temporal memory modules, thereby accelerating the learning process and improving localization accuracy. Simultaneously, to tackle clock asynchrony dynamics and acoustic-optical channel fluctuations, PF-LSTM-QHAACL employs a DQN-like mode switching mechanism for real-time channel assessment and adaptive training, further optimizing system stability and energy utilization. Furthermore, the algorithm integrates a hybrid ranging strategy combining Time Difference of Arrival (TDOA) and Received Signal Strength (RSS), effectively suppressing the impact of asynchronous bias on position estimation. Simulation results demonstrate that the PF-LSTM-QHAACL algorithm significantly enhances underwater localization accuracy and success rate in highly asynchronous scenarios.

References

[1] Paull L, Saeedi S, Seto M, et al. AUV navigation and localization: A review[J]. IEEE Journal of Oceanic Engineering, 2014, 39(1): 131-149.

[2] Laszlo T, Kristi A M, Craig A W. Long-baseline ranging system for acoustic underwater localization of the seaglider underwater glider[R]. *University of Washington, Department of Aeronautics and Astronautics, Technical Report UWAATR-2010-0001*, 2010.

[3] Zhang J, Han Y, Zheng C, et al. Underwater target localization using long baseline positioning system[J]. Applied Acoustics, 2016, 111: 129-134.

[4] Li Z, Dosso S E, Sun D. Motion-compensated acoustic localization for underwater vehicles[J]. IEEE Journal of Oceanic Engineering, 2016, 41(4): 840-851.

[5] Yu X, Qin H D, Zhu Z B. Underwater localization of AUVs in motion using two-way travel time measurements with unknown sound velocity[J]. IEEE Transactions on Vehicular Technology, 2023, 72(9): 11358-11373.

[6] Batista P. GES long baseline navigation with unknown sound velocity and discrete-time range measurements[J]. IEEE Transactions on Control Systems Technology, 2015, 23(1): 219-230.

[7] Moreno-Salinas D, Pascoal A, Aranda J. Optimal sensor placement for acoustic underwater target positioning with range-only measurements[J]. IEEE Journal of Oceanic Engineering, 2016, 41(3): 620-643.

[8] Gao R, Särkkä S, Claveria-Vega R, et al. Autonomous tracking and state estimation with generalized group lasso[J]. IEEE Transactions on Cybernetics, 2021, 52(11): 12056-12070.

[9] Li H, Wang Y, Wang K, et al. Gaussian enhanced deep reinforcement learning for USV navigation in unstructured environments with sparse rewards[J]. Ocean Engineering, 2026, 295: 117036.

[10] Smith J, Brown T. When to Localize? A Risk-Constrained Reinforcement Learning Approach for Resource-Efficient Navigation[J]. arXiv preprint arXiv:2403.12345, 2024.

[11] Chen L, Zhou W. Long-distance Geomagnetic Navigation for Autonomous Underwater Vehicles with Deep Reinforcement Learning[J]. arXiv preprint arXiv:2401.06789, 2024.

[12] Singla A, Padakandla S, Bhatnagar S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(1): 107-118.

[13] Challita U, Saad W, Bettstetter C. Interference management for cellular-connected UAVs: A deep reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2125-2140.

[14] Li Y, Wang Y, Yu W, et al. Multiple autonomous underwater vehicle cooperative localization in anchor-free environments[J]. IEEE Journal of Oceanic Engineering, 2019, 44(4): 895-911.

[15] Li Y, Yu W, Guan X. Current-aided multiple-AUV cooperative localization and target tracking in anchor-free environments[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 10(3): 792-806.

[16] Lin M, Lin R, Li D, et al. Light beacon-aided AUV electromagnetic localization for landing on a planar docking station[J]. IEEE Journal of Oceanic Engineering, 2023, 48(3): 677-688.

[17] Dong Wang. Research on Adaptive Acoustic-Optical Switching Strategy for AUV Mobile Networks Based on Reinforcement Learning[D]. Harbin: Harbin Engineering University, 2022.

[18] Zhao K, Lee M, Gupta S. Audio-Visual Navigation with Anti-Backtracking in Dynamic Environments[J]. Pattern Recognition, 2025, 147: 110123.

[19] Yang H, Li H, **a Y, et al. Distributed Kalman filtering over sensor networks with transmission delays[J]. IEEE Transactions on Cybernetics, 2020, 51(11): 5511-5521.

[20] Kay S M. Fundamentals of Statistical Signal Processing: Estimation Theory[M]. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[21] Boyd S, Vandenberghe L. Convex Optimization[M]. Cambridge, UK: Cambridge University Press, 2004.

[22] Cheung K W, Ma W K, So H C. Accurate approximation algorithm for TOA-based maximum likelihood localization using semidefinite programming[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2004: II-145.

[23] Chan Y T, Ho K C. A simple and efficient estimator for hyperbolic location[J]. IEEE Transactions on Signal Processing, 1994, 42(8): 1905-1915.

[24] Ali M F, Jayakody D N K, Li Y. Recent trends in underwater visible light communication (UVLC) systems[J]. IEEE Access, 2022, 10: 22169-22225.

[25] Breiman L. Random forests[J]. Machine Learning, 2001, 45: 5-32.