Research on Cross-modal Pedestrian Re-identification Based on Transformer
DOI:
https://doi.org/10.54097/8drb0c69Keywords:
Deep learning; Public safety; Transformer.Abstract
Visible infrared human body recognition (VI ReID) is a challenging task for complex modal change retrieval. Existing methods usually focus on extracting discriminative visual features, while ignoring the reliability and commonness of visual features between different modes. In this paper, we propose a new deep learning framework, called multi-scale local progressive transformers (MLT), for effective VI-ReID. In order to reduce the negative impact of modal gap, we first take the gray image as an auxiliary mode, take the Transformer model as the benchmark, and propose a progressive learning strategy. The sea attention mechanism is fused with the dilateformer to further improve the discrimination ability of reliable features, and its feasibility is increased through ablation experiments.
Downloads
References
[1] Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; and Hoi, S. C.2021. Deep learning for person re-identification: A surveyand outlook. IEEE Transactions on Pattern Analysis andMachine Intelligence, 44(6): 2872–2893.
[2] Gao, Y.; Liang, T.; Jin, Y.; Gu, X.; Liu, W.; Li, Y.; and Lang,C. 2021. MSO: Multi-feature space joint optimization network for rgb-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, 5257–5265.
[3] Chen, Y.; Wan, L.; Li, Z.; Jing, Q.; and Sun, Z. 2021b. Neural feature search for rgb-infrared person re-identification.In Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 587–597.
[4] Li, D.; Wei, X.; Hong, X.; and Gong, Y. 2020. Infraredvisible cross-modal person re-identification with an xmodality. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 4610–4617. DOI: https://doi.org/10.1609/aaai.v34i04.5891
[5] Dai, P.; Ji, R.; Wang, H.; Wu, Q.; and Huang, Y. 2018. Crossmodality person re-identifcation with generative adversarial training. In International Joint Conference on Artifcial Intelligence, volume 1, 6. DOI: https://doi.org/10.24963/ijcai.2018/94
[6] Wang, G.; Zhang, T.; Cheng, J.; Liu, S.; Yang, Y.; and Hou, Z. 2019. RGB-infrared cross-modality person re-identifcation via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3623–3632. DOI: https://doi.org/10.1109/ICCV.2019.00372
[7] Zhu, Y.; Yang, Z.; Wang, L.; Zhao, S.; Hu, X.; and Tao, D. 2020. Hetero-center loss for cross-modality person re-identifcation. Neurocomputing, 386: 97–109. DOI: https://doi.org/10.1016/j.neucom.2019.12.100
[8] Zhang, L.; Du, G.; Liu, F.; Tu, H.; and Shu, X. 2021b. Global-local multiple granularity learning for cross-modality visible-infrared person reidentifcation. IEEE Transactions on Neural Networks and Learning Systems.
[9] Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
[10] Wu,A.; Zheng, W.-S.; Yu, H.-X.; Gong, S.; and Lai, J. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, 5380–5389. DOI: https://doi.org/10.1109/ICCV.2017.575
[11] Choi, S.; Lee, S.; Kim, Y.; Kim, T.; and Kim, C. 2020. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10257–10266. DOI: https://doi.org/10.1109/CVPR42600.2020.01027
[12] Wu, A.; Zheng, W.-S.; Gong, S.; and Lai, J. 2020. Rgb-ir person re-identification by cross-modality similarity preservation. International Journal of Computer Vision, 128(6): 1765–1785. DOI: https://doi.org/10.1007/s11263-019-01290-1
[13] Ye, H.; Liu, H.; Meng, F.; and Li, X. 2020a. Bi-directional exponential angular triplet loss for RGB-infrared person re-identification. IEEE Transactions on Image Processing, 30: 1583–1595. DOI: https://doi.org/10.1109/TIP.2020.3045261
[14] Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; and Hoi, S. C. 2021. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6): 2872–2893. DOI: https://doi.org/10.1109/TPAMI.2021.3054775
[15] Ye, M.; Shen, J.; and Shao, L. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security, 16: 728–739. DOI: https://doi.org/10.1109/TIFS.2020.3001665
[16] Park, H.; Lee, S.; Lee, J.; and Ham, B. 2021. Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 12046–12055. DOI: https://doi.org/10.1109/ICCV48922.2021.01183
[17] Chen, Y.; Wan, L.; Li, Z.; Jing, Q.; and Sun, Z. 2021b. Neural feature search for rgb-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 587–597. DOI: https://doi.org/10.1109/CVPR46437.2021.00065
[18] Gao, Y.; Liang, T.; Jin, Y.; Gu, X.; Liu, W.; Li, Y.; and Lang, C. 2021. MSO: Multi-feature space joint optimization network for rgb-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, 5257–5265. DOI: https://doi.org/10.1145/3474085.3475643
[19] Fu, C.; Hu, Y.; Wu, X.; Shi, H.; Mei, T.; and He, R. 2021. CM-NAS: Cross-modality neural architecture search for visible-infrared person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11823–11832. DOI: https://doi.org/10.1109/ICCV48922.2021.01161
[20] Huang, Z.; Liu, J.; Li, L.; Zheng, K.; and Zha, Z.-J. 2022. Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-Identification. arXiv preprintarXiv:2203.01735. DOI: https://doi.org/10.1609/aaai.v36i1.19987
[21] Chen, C.; Ye, M.; Qi, M.; Wu, J.; Jiang, J.; and Lin, C.-W. 2022. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE Transactions on Image Processing, 31: 2352–2364. DOI: https://doi.org/10.1109/TIP.2022.3141868
[22] Zhang, Q.; Lai, C.; Liu, J.; Huang, N.; and Han, J. 2022. FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7349–7358. DOI: https://doi.org/10.1109/CVPR52688.2022.00720
[23] Mang Ye, Jianbing Shen, David J Crandall, Ling Shao, andJiebo Luo. Dynamic dual-attentive aggregation learning forvisible-infrared person re-identification. In Proceedings ofthe ECCV, pages 229–247, 2020. DOI: https://doi.org/10.1007/978-3-030-58520-4_14
[24] Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C Y uen.Visible thermal person re-identification via dual-constrainedtop-ranking. In Proceedings of the IJCAI, pages 1092–1099,2018. DOI: https://doi.org/10.24963/ijcai.2018/152
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.