Research Advanced in Human Pose Estimation based on Deep Learning
DOI:
https://doi.org/10.54097/f4948z63Keywords:
Human Pose Estimation, Top-down Approach, Bottom-up Approach, Single-stage Approach, Transformer-based Approach.Abstract
Human pose estimation constitutes a crucial task within the realm of computer vision. Its objective is to precisely identify and locate the main features of the human body as shown in images or videos, thereby enabling the understanding and analysis of human behavior. In the early stages, pose estimation efforts predominantly focused on single-person scenes, relying mainly on template matching, direct regression, and detection methods. Thanks to the rapid progress of deep learning technologies, research on human pose estimation has evolved from single-person scenarios to complex multi-person ones. Multi-person pose estimation can be separated into top-down and bottom-up approaches. The top-down method initially detects the human body and then conducts pose estimation. Although relatively straightforward to implement, it has limitations when dealing with multi-person scenes and occlusion situations. In contrast, the bottom-up method first detects main features and then assigns them to different individuals, proving more effective in handling occlusion and multi-person scenes. Considering the above factors, this paper undertakes a comprehensive and in-depth study on human pose estimation based on deep learning, covering multiple aspects such as single-person pose estimation, multi-person pose estimation, datasets, evaluation metrics, algorithms, and experimental methods. Additionally, it discusses the existing problems and future development directions from perspectives including improving the model architecture, optimizing data processing, and integrating multi-modal information. Moreover, recent advancements in deep learning architectures, including convolutional neural networks and recurrent neural networks, have further enhanced the performance and accuracy of human pose estimation. The combination of different modalities, such as depth information and infrared imaging, also holds promise for addressing some of the challenges in complex scenes.
Downloads
References
[1] Liu Z, Wu S, Jin S, et al. (2019). Towards natural and accurate motion prediction of humans and animals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: 10004 - 1012.
[2] Liu Z, Lyu K, Wu S, et al. (2021). Aggregated multi GANs for controlled 3D human motion prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 2222 - 2232.
[3] LI Jia ning, WANG Dong kai, ZHANG Shi liang. Two-dimensional human pose estimation based on deep learning: current situation and prospects [J]. Chinese Journal of Computers, 2024, 47 (01): 231 - 250).
[4] Li J, Wang C, Zhu H, Mao Y, Fang H-S, & Lu C (2019). Crowdpose: Efficient Crowded Scenes Pose Estimation and a New Benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Balakrishnan K, Upadhyay D. BTranspose: Bottleneck transformers for human pose estimation with self-supervised pre-training [J]. arXiv preprint arXiv: 2204. 10209, 2022.
[6] Z. Cao, G. Hidalgo, T. Simon, S. -E. Wei and Y. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 172 - 186, 1 Jan. 2021.
[7] Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[8] Kocabas, Muhammed, Salih Karagoz, and Emre Akbas. "Multiposenet: Fast multi-person pose estimation using pose residual network." Proceedings of the European conference on computer vision (ECCV). 2018.
[9] Kreiss, Sven, Lorenzo Bertoni, and Alexandre Alahi. "Pifpaf: Composite fields for human pose estimation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[10] B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang and L. Zhang, "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 5385 - 5394.
[11] Topham L K, Khan W, Al-Jumpily D H A. Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and Models [J]. ACM computing surveys, 2023, 55 (6): 120. 1 - 120.42.
[12] Barroso A C, José Matias, Morandotti M, et al. The Variational Modeling of Hierarchical Structured Deformations [J]. Journal of Elasticity, 2024, 155 (1-5): 371 - 392.DOI: 10.1007/s10659-022-09961-w.
[13] Du X, Kurmann T, Chang P L, et al. Articulated Multi-Instrument 2D Pose Estimation Using Fully Convolutional Networks [J]. IEEE Transactions on Medical Imaging, 2018: 1 - 1. DOI: 10.1109/TMI. 2017. 2787672.
[14] Verhoeff N P L G, Kapur S, Hussey D, et al. A Simple Method to Measure Baseline Occupancy of Neostriatal Dopamine D2 Receptors by Dopamine In Vivo in Healthy Subjects [J]. Neuropsychopharmacology, 2001, 25 (2): 213 - 223. DOI: 10.1016/S0893-133X (01)00231 - 7.
[15] Dhingra, Naina. "Headposr: End-to-end trainable head pose estimation using transformer encoders." 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 2021.
[16] Shi, Dahu, et al. "End-to-end multi-person pose estimation with transformers." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
[17] Samkari, Esraa, et al. "Human pose estimation using deep learning: a systematic literature review." Machine Learning and Knowledge Extraction 5.4 (2023): 1612 - 1659.
[18] CAO, ZHE, HIDALGO, GINES, SIMON, TOMAS, et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J]. 2021, 43 (1): 172 - 186. DOI: 10.1109/TPAMI.2019.2929257.
[19] Cao Z, Simon T, Wei S E, et al. (2017). Real-time multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: 7291 - 7299.
[20] Z Cao, G Hidalgo, T Simon, S. -E. Wei and Y. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 172 - 186, 1 Jan. 2021.
[21] Pan Y, Zhu T, Chen L, et al. (2018). PersonLab: Person poses estimation with a bottom-up approach. In Proceedings of the European Conference on Computer Vision. Munich, Germany: 2692 - 2699.
[22] Kocabas, Muhammed, Salih Karagoz, and Emre Akbas. "Multiposenet: Fast multi-person pose estimation using pose residual network." Proceedings of the European conference on computer vision (ECCV). 2018.
[23] Kreiss, Sven, Lorenzo Bertoni, and Alexandre Alahi. "Pifpaf: Composite fields for human pose estimation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[24] B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang and L. Zhang, "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 5385 - 5394.
[25] Luo Z, Wang Z, Yuan Y, et al. (2021). Rethinking the heatmap regression for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1326 - 1327.
[26] Dang, Qi, et al. "Deep learning based 2d human pose estimation: A survey." Tsinghua Science and Technology 24.6 (2019): 663 - 676.
[27] Yu C, B, Gao C, et al. Lite-hrnet: A lightweight high-resolution network [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 10440 - 10450.
[28] Li, Yanjie, et al. "Tokenpose: Learning keypoint tokens for human pose estimation." Proceedings of the IEEE/CVF International conference on computer vision. 2021.
[29] Li K, Wang S, Zhang X, et al. Pose recognition with cascade transformers [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 1944 - 1953.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







