Hybrid Attention Fusion in Dense Crowd Counting
DOI:
https://doi.org/10.54097/fcis.v2i1.2707Keywords:
Crowd counting, Attention fusion, SoftMax algorithm, Density mapAbstract
One of appealing approaches to guiding deep parameter optimization, is attentional supervision, which inspires intelligence in complex networks at a fraction of the cost, but there is still room for improvement. First, the real dense scene with varying scales and uneven density distribution of human heads, the density map cannot be clearly expressed. Second, the heavily occluded areas are extremely similar to the complex background, which further aggravates the counting error. Therefore, we propose a dual-track attention network that distinguishes between global and local information, which is responsible for the target overlap and background confusion problems, respectively, and finally converges and normalizes with the feature map to transform the multi-channel attention map into a single-channel density map. Meanwhile the heterogeneous pyramid design alleviates the distress of scale variation and density dissimilarity. Experiments on several official datasets prove the effectiveness of the scheme to enhance key information and overcome confounding factors.
Downloads
References
Zhang Y, Zhou D, Chen S, et al. Single-image crowd counting via multi-column convolutional neural network[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 589-597.
Babu Sam D, Surya S, Venkatesh Babu R. Switching convolutional neural network for crowd counting[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5744-5752.
Li Y, Zhang X, Chen D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1091-1100.
Cao X, Wang Z, Zhao Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 734-750.
Jiang X, Zhang L, Xu M, et al. Attention scaling for crowd counting[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 4706-4715.
Wang F, Sang J, Wu Z, et al. Hybrid attention network based on progressive embedding scale-context for crowd counting[J]. Information Sciences, 2022, 591: 306-318.
Zhang A, Shen J, Xiao Z, et al. Relational attention network for crowd counting[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6788-6797.
Rong L, Li C. Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2021: 3675-3684.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
Idrees H, Saleemi I, Seibert C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 2547-2554.
Idrees H, Tayyab M, Athrey K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 532-546.
Ma Z, Wei X, Hong X, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6142-6151.
Liu L, Lu H, Zou H, et al. Weighing counts: Sequential crowd counting by reinforcement learning[C]//European Conference on Computer Vision. Springer, Cham, 2020: 164-181.
Hu Y, Jiang X, Liu X, et al. Nas-count: Counting-by-density with neural architecture search[C]//European Conference on Computer Vision. Springer, Cham, 2020: 747-766.
Ma Z, Wei X, Hong X, et al. Learning to count via unbalanced optimal transport[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(3): 2319-2327.
Xu Y, Zhong Z, Lian D, et al. Crowd counting with partial annotations in an image[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15570-15579.
Chen B, Yan Z, Li K, et al. Variational attention: Propagating domain-specific knowledge for multi-domain learning in crowd counting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 16065-16075.
Cheng J, Xiong H, Cao Z, et al. Decoupled two-stage crowd counting and beyond[J]. IEEE Transactions on Image Processing, 2021, 30: 2862-2875.


