Infrared Imaging-Based Object Detection and Tracking for UAV Systems: Principles, Algorithms, and Advances

Authors

  • Weizeng Qing

DOI:

https://doi.org/10.54097/sw8nmq23

Keywords:

Infrared Imaging, UAV Perception, Object Detection, Small-Target Detection, Object Tracking, Deep Learning, Transformer Models

Abstract

Infrared sensing has become a critical perception modality for unmanned aerial vehicles (UAVs), enabling robust operation under low illumination, night-time conditions, and adverse weather. This survey provides a systematic and comprehensive review of the infrared UAV perception pipeline, covering imaging principles, preprocessing techniques, deep-learning-based object detection, small-target enhancement strategies, and state-of-the-art object tracking algorithms. We first describe the physical foundations of infrared radiation and summarize key preprocessing procedures such as nonuniformity correction and deep-learning-based denoising. We then examine the evolution of infrared object detection, including CNN- and Transformer-based frameworks, with particular attention to modern YOLO variants and methods designed for small and tiny targets commonly observed in aerial platforms. Furthermore, we review traditional correlation-filter-based tracking, advanced Siamese and discriminative learning trackers, reinforcement-learning-based approaches, and recent Transformer-driven trackers, followed by an overview of multispectral and graph-based multi-object tracking strategies. Typical UAV applications and widely adopted evaluation metrics are also discussed. This survey aims to provide a unified reference for researchers and practitioners developing high-performance infrared perception systems for UAVs.

Downloads

Download data is not yet available.

References

[1] Brownrigg D R K. The weighted median filter[J]. Communications of the ACM, 1984, 27(8): 807-818.

[2] Rita C. Improving shadow suppression in moving object detection with HSV color information[C]//2001 IEEE Intelligent Transportation Systems Conference Proceedings. 2001.

[3] Zhang H. mixup: Beyond empirical risk minimization[J]. arxiv preprint arxiv:1710.09412, 2017.

[4] Qian W, Chen Q, Gu G. Space low-pass and temporal high-pass nonuniformity correction algorithm[J]. Optical review, 2010, 17: 24-29.

[5] Harris J G, Chiang Y M. Nonuniformity correction of infrared image sequences using the constant-statistics constraint[J]. IEEE Transactions on image processing, 1999, 8(8): 1148-1151.

[6] Torres F, Torres S N, Martín C S. A recursive least square adaptive filter for nonuniformity correction of infrared image sequences[C]//Progress in Pattern Recognition, Image Analysis and Applications: 10th Iberoamerican Congress on Pattern Recognition, CIARP 2005, Havana, Cuba, November 15-18, 2005. Proceedings 10. Springer Berlin Heidelberg, 2005: 540-546.

[7] Jiang G, Jia J, Liu S. Nonuniformity correction of infrared image based on scene matching[C]//Multispectral and Hyperspectral Image Acquisition and Processing. SPIE, 2001, 4548: 280-283.

[8] Huang Y, Zhang B H, Wu J, et al. Adaptive multipoint calibration non-uniformity correction algorithm[J]. Infrared Technol, 2020, 42(7): 637-643.

[9] Donoho D L, Johnstone I M, Kerkyacharian G, et al. Universal near minimaxity of wavelet shrinkage[M]//Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics. New York, NY: Springer New York, 1997: 183-218.

[10] KIVANC MIHCAK M, KOZINTSEV I, RAMCHANDRAN K, etc. Low-complexity image denoising based on statistical modeling of wavelet coefficients[J/OL]. IEEE Signal Processing Letters, 1999, 6(12): 300-303. DOI:10.1109/ 97. 803428.

[11] Buades A, Coll B, Morel J M. A non-local algorithm for image denoising[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 2: 60-65.

[12] Divakar N, Venkatesh Babu R. Image denoising via CNNs: An adversarial approach[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 80-87.

[13] Zhang F, Cai N, Wu J, et al. Image denoising method based on a deep convolution neural network[J]. IET Image Processing, 2018, 12(4): 485-493.

[14] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.

[15] Zhang X, Yang Y H, Han Z, et al. Object class detection: A survey [J]. ACM Computing Surveys (CSUR), 2013, 46(1): 1-53.

[16] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//2008 IEEE conference on computer vision and pattern recognition. Ieee, 2008: 1-8.

[17] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.

[18] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.

[19] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.

[20] Redmon J. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[21] Sahin O, Ozer S. Yolodrone: Improved yolo architecture for object detection in drone images[C]//2021 44th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2021: 361-365.

[22] Tan L, Lv X, Lian X, et al. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm[J]. Computers & Electrical Engineering, 2021, 93: 107261.

[23] Zhang Z, Lu X, Cao G, et al. ViT-YOLO: Transformer-based YOLO for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 2799-2808.

[24] Koay H V, Chuah J H, Chow C O, et al. YOLO-RTUAV: Towards real-time vehicle detection through aerial images with low-cost edge devices[J]. Remote Sensing, 2021, 13(21): 4196.

[25] Pikun W, Ling W, Jiangxin Q, et al. Unmanned aerial vehicles object detection based on image haze removal under sea fog conditions[J]. IET Image Processing, 2022, 16(10): 2709-2721.

[26] Yang F, Zhang X, Liu B. Video object tracking based on YOLOv7 and DeepSORT[J]. arxiv preprint arxiv:2207.12202, 2022.

[27] Li Y, Miao N, Ma L, et al. Transformer for object detection: Review and benchmark[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107021.

[28] Zhang Q, Zhang H, Lu X. Adaptive feature fusion for small object detection[J]. Applied Sciences, 2022, 12(22): 11854.

[29] Zhang H, Du Q, Qi Q, et al. A recursive attention-enhanced bidirectional feature pyramid network for small object detection[J]. Multimedia tools and applications, 2023, 82(9): 13999-14018.

[30] Hu W, Tian Z, Chen S, et al. Dense feature pyramid network for ship detection in SAR images[C]//2020 International Conference on Image, Video Processing and Artificial Intelligence. SPIE, 2020, 11584: 327-335.

[31] Li J, Wu P, Xu R, et al. DSCAFormer: Lightweight Vision Transformer with Dual-Branch Spatial Channel Aggregation [J]. IEEE Access, 2024.

[32] Shang L, Liu Y, Lou Z, et al. Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention[J]. arxiv preprint arxiv:2308.05872, 2023.

[33] Zhang Y F, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.

[34] Zhang X, Fang X, Pan M, et al. A marine organism detection framework based on the joint optimization of image enhancement and object detection[J]. Sensors, 2021, 21(21): 7205.

[35] Pezzano G, Ripoll V R, Radeva P. CoLe-CNN: Context-learning convolutional neural network with adaptive loss function for lung nodule segmentation[J]. Computer Methods and Programs in Biomedicine, 2021, 198: 105792.

[36] Gee A, Cipolla R. Fast visual tracking by temporal consensus [J]. Image and Vision Computing, 1996, 14(2): 105-114.

[37] Mekonnen A A, Lerasle F. Comparative evaluations of selected tracking-by-detection approaches[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(4): 996-1010.

[38] Yuan D, Zhang H, Shu X, et al. Thermal infrared target tracking: A comprehensive review[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 73: 1-19.

[39] Zhang W, Song K, Rong X, et al. Coarse-to-fine UAV target tracking with deep reinforcement learning[J]. IEEE Transactions on Automation Science and Engineering, 2018, 16(4): 1522-1530.

[40] Liu Q, Li X, He Z, et al. Learning deep multi-level similarity for thermal infrared object tracking[J]. IEEE Transactions on Multimedia, 2020, 23: 2114-2126.

[41] Koubâa A, Qureshi B. Dronetrack: Cloud-based real-time object tracking using unmanned aerial vehicles over the internet[J]. IEEE Access, 2018, 6: 13810-13824.

[42] Zhang Y, Yu Y F, Chen L, et al. Robust correlation filter learning with continuously weighted dynamic response for UAV visual tracking[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-14.

[43] Zhou Y, Su H, Tian S, et al. Multiple-kernelized-correlation-filter-based track-before-detect algorithm for tracking weak and extended target in marine radar systems[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(4): 3411-3426.

[44] Li W, Zhao W, Gu J, et al. Dynamic characteristics monitoring of large wind turbine blades based on target-free DSST vision algorithm and UAV[J]. Remote Sensing, 2022, 14(13): 3113.

[45] Cheng R, Sang N, Zhou Y, et al. Non-rigid transformation based adversarial attack against 3D object tracking[C]// ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 2744-2748.

[46] Zheng Y, Yu Z, Wang S, et al. Spike-based motion estimation for object tracking through bio-inspired unsupervised learning [J]. IEEE Transactions on Image Processing, 2022, 32: 335-349.

[47] Greco C, Vasile M. Robust Bayesian particle filter for space object tracking under severe uncertainty[J]. Journal of Guidance, Control, and Dynamics, 2022, 45(3): 481-498.

[48] Fan Z, Zhu Y, He Y, et al. Deep learning on monocular object pose detection and tracking: A comprehensive overview[J]. ACM Computing Surveys, 2022, 55(4): 1-40.

[49] He A, Luo C, Tian X, et al. A twofold siamese network for real-time object tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4834-4843.

[50] Li B, Yan J, Wu W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8971-8980.

[51] Li X, Liu Q, Fan N, et al. Hierarchical spatial-aware siamese network for thermal infrared object tracking[J]. Knowledge-Based Systems, 2019, 166: 71-81.

[52] Li H, Li Y, Porikli F. Deeptrack: Learning discriminative feature representations by convolutional neural networks for visual tracking[C]//BMVC. 2014, 1(2): 3.

[53] Marvasti-Zadeh S M, Khaghani J, Cheng L, et al. Chase: Robust visual tracking via cell-level differentiable neural architecture search[J]. arxiv preprint arxiv:2107.03463, 2021.

[54] Liu Q, Yuan D, Fan N, et al. Learning dual-level deep representation for thermal infrared tracking[J]. IEEE Transactions on Multimedia, 2022, 25: 1269-1281.

[55] Supancic III J, Ramanan D. Tracking as online decision-making: Learning a policy from streaming videos with reinforcement learning[C]//Proceedings of the IEEE international conference on computer vision. 2017: 322-331.

[56] Yang J, Li C, Zhang P, et al. Focal attention for long-range interactions in vision transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 30008-30022.

[57] Xiao Y, Meng F, Wu Q, et al. Gm-detr: Generalized muiltispectral detection transformer with efficient fusion encoder for visible-infrared detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 5541-5549.

[58] Maksai A, Wang X, Fleuret F, et al. Non-markovian globally consistent multi-object tracking[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2544-2554.

[59] Wang Z, Zheng L, Liu Y, et al. Towards real-time multi-object tracking[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 107-122.

[60] Azhar M I H, Zaman F H K, Tahir N M, et al. People tracking system using DeepSORT[C]//2020 10th IEEE international conference on control system, computing and engineering (ICCSCE). IEEE, 2020: 137-141.

[61] Yuan D, Shu X, Liu Q, et al. Robust thermal infrared tracking via an adaptively multi-feature fusion model[J]. Neural Computing and Applications, 2023, 35(4): 3423-3434.

[62] Yan B, Jiang Y, Sun P, et al. Towards grand unification of object tracking[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 733-751.

[63] Zhou X, Koltun V, Krähenbühl P. Tracking objects as points[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 474-490.

[64] Zhang L, Gonzalez-Garcia A, Van De Weijer J, et al. Synthetic data generation for end-to-end thermal infrared tracking[J]. IEEE Transactions on Image Processing, 2018, 28(4): 1837-1850.

[65] Zaech J N, Liniger A, Dai D, et al. Learnable online graph representations for 3d multi-object tracking[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 5103-5110.

[66] Chu P, Wang J, You Q, et al. Transmot: Spatial-temporal graph transformer for multiple object tracking[C]//Proceedings of the IEEE/CVF Winter Conference on applications of computer vision. 2023: 4870-4880.

[67] Liu J, Wang H, Wang J, et al. Thermal infrared action recognition with two-stream shift Graph Convolutional Network[J]. Machine Vision and Applications, 2024, 35(4): 65.

[68] Alhafnawi M, Salameh H A B, Masadeh A, et al. A survey of indoor and outdoor uav-based target tracking systems: Current status, challenges, technologies, and future directions[J]. IEEE Access, 2023, 11: 68324-68339.

[69] Yeom S. Thermal image tracking for search and rescue missions with a drone[J]. Drones, 2024, 8(2): 53.

[70] Gonzalez L F, Montes G A, Puig E, et al. Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation[J]. Sensors, 2016, 16(1): 97.

[71] Usamentiaga R. Semiautonomous pipeline inspection using infrared thermography and unmanned aerial vehicles[J]. IEEE Transactions on Industrial Informatics, 2023, 20(2): 2540-2550.

[72] Kasturi R, Goldgof D, Soundararajan P, et al. Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol[J]. IEEE transactions on pattern analysis and machine intelligence, 2008, 31(2): 319-336.

[73] Liu S, Wang S, Liu X, et al. Fuzzy detection aided real-time and robust visual tracking under complex environments[J]. IEEE Transactions on Fuzzy Systems, 2020, 29(1): 90-102.

[74] Weng X, Wang J, Held D, et al. 3d multi-object tracking: A baseline and new evaluation metrics[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020: 10359-10366.

[75] Felsberg M, Berg A, Hager G, et al. The thermal infrared visual object tracking VOT-TIR2015 challenge results[C]// Proceedings of the ieee international conference on computer vision workshops. 2015: 76-88.

Downloads

Published

29-12-2025

Issue

Section

Articles

How to Cite

Qing, W. (2025). Infrared Imaging-Based Object Detection and Tracking for UAV Systems: Principles, Algorithms, and Advances. Frontiers in Computing and Intelligent Systems, 14(3), 1-6. https://doi.org/10.54097/sw8nmq23