DFENet: Double Feature Enhanced Class Agnostic Counting Methods

Authors

  • Jiakang Liu
  • Hua Huo

DOI:

https://doi.org/10.54097/fcis.v6i1.14

Keywords:

Object Count, Class Agnostic Counting, Conditional Random Field, Similarity Measure

Abstract

Object counting is a basic computer vision task, which can estimate the number of each object in an image, thus providing valuable information. In dense scenes, there are huge differences in target individual scale, and the different target individual scale leads to low accuracy of target count. In addition, most of the existing target count datasets in the field require a lot of manual creation and annotation, which increases the cost and difficulty of the dataset, lack of ease of use and portability. To solve these problems, this paper proposes a class agnostic counting method Double Feature Enhancement Net based on improved Bilinear Matching Network+ (BMNet+). By introducing the feature enhancement module based on the principle of conditional random field and the adaptively spatial feature fusion module, combined with the feature similarity measurement strategy of bilinear matching network, the method can effectively extract the target features of different scales, enhance the adaptability to the targets with large scale changes, and improve the counting performance of the network. Experiments were carried out on FSC-147 data set, and the experimental results show that the proposed model has been further improved in counting accuracy. The MAE and MSE of the verification set are 15.03 and 54.53 respectively. In the test set, MAE reaches 13.65, MSE reaches 89.54, and the counting performance is at the advanced level in the field.

Downloads

Download data is not yet available.

References

WU B,NEVATIA R.Detection and tracking of multiple,partially occluded humans by Bayesian combination of edgelet based part detectors[J].International Journal of Computer Vision,2007,75(2):247-266.

LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation [J]. IEEE Transactions on Systems,Man,and Cybernetics,Part A:Systems and Humans,2001,31:645-654.

MIN L,ZHANG Z,HUANG K,et al.Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]// 2008 19th International Conference on Pattern Recognition,2009.

XU T,CHEN X,WEI G,et al.Crowd counting using accumulated HOG[C]// 2016 12th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD),2016:1877-1881.

CHAN A B,LIANG Z S J,VASCONCELOS N.Privacy preserving crowd monitoring:counting people without people models or tracking[C]// 2008 IEEE Conference on Computer Vision and Pattern Recognition,2008:1-7.

RYAN D,DENMAN S,FOOKES C,et al.Crowd counting using multiple local features[C]// 2009 Digital Image Computing:Techniques and Applications,2009:81-88.

KE C,CHEN C L,GONG S,et al.Feature mining for localised crowd counting[C]// British Machine Vision Conference,2012.

PARAGIOS N,RAMESH V.A MRF-based approach for real-time subway monitoring[C]// Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2001.

MCDONALD G C.Ridge regression[J].Wiley Interdisciplinary Reviews:Computational Statistics,2009,1(1):93-100.

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, pp. 886-893 vol. 1, doi: 10.1109/CVPR.2005.177.

Girshick R , Donahue J, Darrell T ,et al.Rich feature hierarchies for accurate object detection and semantic segmentation tech report (v5)[J]. 2017.DOI:10. 1109/ cvpr. 2014.81.

Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

Erika Lu, Weidi Xie, Andrew Zisserman. Class-Agnostic Counting [C]//Asian Conference on Computer Vision (ACCV),2018. arXiv:1811.00472.

Viresh Ranjan, Udbhav Sharma, Thu Nguyen, Minh Hoai. Learning To Count Everything[C]//Computer Vision and Pattern Recognition(CVPR), 2021. arXiv:2104.08391.

Min Shi, Hao Lu, Chen Feng, Chengxin Liu, Zhiguo Cao. Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting[C]// Computer Vision and Pattern Recognition(CVPR), 2022. arXiv: 2203. 08354.

Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1529-1537.

Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, and Nicu Sebe. Learning deep structured multi-scale features using attention-gated crfs for contour prediction. In NIPS, pages 3961–3970, 2017.

Wang D, Ouyang W, Li W, et al. Dividing and aggregating network for multi-view action recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 451-467.

Liu L, Qiu Z, Li G, et al. Crowd counting with deep structured scale integration network[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1774-1783.

Liu S, Huang D, Wang Y. Learning spatial fusion for single-shot object detection[J]. arXiv preprint arXiv:1911.09516, 2019.

Yang S D, Su H T, Hsu W H, et al. Class-agnostic few-shot object counting[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 870-878.

Musgrave K, Belongie S, Lim S N. A metric learning reality check [C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer International Publishing, 2020: 681-699.

Atallah M J. Faster image template matching in the sum of the absolute value of differences measure[J]. IEEE Transactions on image processing, 2001, 10(4): 659-663.

Lewis J P. Fast template matching[C]//Vision interface. 1995, 95(120123): 15-19.

Sun Y, Cheng C, Zhang Y, et al. Circle loss: A unified perspective of pair similarity optimization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 6398-6407.

Wang H, Wang Y, Zhou Z, et al. Cosface: Large margin cosine loss for deep face recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5265-5274.

Hadsell R, Chopra S, LeCun Y. Dimensionality reduction by learning an invariant mapping[C]//2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06). IEEE, 2006, 2: 1735-1742.

Weinberger K Q, Saul L K. Distance metric learning for large margin nearest neighbor classification[J]. Journal of machine learning research, 2009, 10(2).

Sohn K. Improved deep metric learning with multi-class n-pair loss objective[J]. Advances in neural information processing systems, 2016, 29.

Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding[J]. arXiv preprint arXiv: 1807. 03748, 2018.

Yuan T, Deng W, Tang J, et al. Signal-to-noise ratio: A robust distance metric for deep metric learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4815-4824.

Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.

Downloads

Published

01-12-2023

Issue

Section

Articles

How to Cite

Liu, J., & Huo, H. (2023). DFENet: Double Feature Enhanced Class Agnostic Counting Methods. Frontiers in Computing and Intelligent Systems, 6(1), 70-76. https://doi.org/10.54097/fcis.v6i1.14