Multi-scale image recognition strategy based on convolutional neural network

Authors

  • Huajun Zhang
  • Su Diao
  • Yining Yang
  • Jiachen Zhong
  • Yafeng Yan

DOI:

https://doi.org/10.54097/ro4puyx5

Keywords:

Convolutional neural networks, Multi-scale feature, Image recognition strategies, Computer vision

Abstract

The accurate recognition and interpretation of multi-scale visual information is a critical focus within contemporary computer vision research. To this end, this study explores and innovatively constructs a multi-scale image recognition strategy based on a Convolutional Neural Network (CNN) with a multi-level and multi-resolution perception domain. This strategy is embedded with an advanced multi-level convolutional operation mechanism, which enables the model to intelligently explore and learn the multi-scale feature representation space of images from tiny texture to grand structure, from shallow simple features to deep semantic abstraction. The core technology path of this paper is to design a deep separable convolutional architecture and combine pyramid pool technology to form a unique network module. This modular design not only ensures the computational efficiency of the model but also improves the ability of extracting and integrating multi-scale image features. Following intensive experimentation on an array of extensively recognized and substantial image datasets, the multi-scale image recognition approach introduced in our study has demonstrated marked enhancements in both recognition capability and stability, manifesting clear superiority compared to conventional, single-scale image recognition methodologies. This research not only enriches the theoretical framework of image recognition, but also provides a new and efficient solution for dealing with complex multi-scale image recognition challenges in practical applications, and further promotes the development of image understanding and recognition technology.

References

O'shea, Keiron, and Ryan Nash. "An introduction to convolutional neural networks." arXiv preprint arXiv:1511.08458 (2015).

He, W., Vu, M. N., Jiang, Z., & Thai, M. T. (2022, December). An explainer for temporal graph neural networks. In GLOBECOM 2022-2022 IEEE Global Communications Conference (pp. 6384-6389). IEEE.

Li, K., Zhu, A., Zhou, W., Zhao, P., Song, J., & Liu, J. (2024). Utilizing Deep Learning to Optimize Software Development Processes. arXiv preprint arXiv:2404.13630."

He, W., & Jiang, Z. (2020). Semi-supervised learning with the em algorithm: A comparative study between unstructured and structured prediction. IEEE Transactions on Knowledge and Data Engineering, 34(6), 2912-2920.

Ning, Q., Zheng, W., Xu, H., Zhu, A., Li, T., Cheng, Y., ... & Wang, K. (2022). Rapid segmentation and sensitive analysis of CRP with paper-based microfluidic device using machine learning. Analytical and Bioanalytical Chemistry, 414(13), 3959-3970.

Lan, G., Liu, X. Y., Zhang, Y., & Wang, X. (2023). Communication-efficient federated learning for resource-constrained edge devices. IEEE Transactions on Machine Learning in Communications and Networking.

Lan, G., Han, D. J., Hashemi, A., Aggarwal, V., & Brinton, C. G. (2024). Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis. arXiv preprint arXiv:2404.08003.

Al-Jawfi, Rashad. "Handwriting Arabic character recognition LeNet using neural network." Int. Arab J. Inf. Technol. 6.3 (2009): 304-309.

Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).

Khan, Riaz Ullah, Xiaosong Zhang, and Rajesh Kumar. "Analysis of ResNet and GoogleNet models for malware detection." Journal of Computer Virology and Hacking Techniques 15 (2019): 29-37.

Zhu, A., Li, J., & Lu, C. (2021). Pseudo view representation learning for monocular RGB-D human pose and shape estimation. IEEE Signal Processing Letters, 29, 712-716.

Sun, Yuxuan, et al. "Roi pooled correlation filters for visual tracking." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

Gong, Tao, et al. "Temporal ROI align for video object recognition." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 2. 2021.

Martinel, Niki, Gian Luca Foresti, and Christian Micheloni. "Deep pyramidal pooling with attention for person re-identification." IEEE Transactions on Image Processing 29 (2020): 7306-7316.

He, W., & Jiang, Z. (2023). A survey on uncertainty quantification methods for deep neural networks: An uncertainty source perspective. arXiv preprint arXiv:2302.13425.

Lan, G., Wang, H., Anderson, J., Brinton, C., & Aggarwal, V. (2024). Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates. Advances in Neural Information Processing Systems, 36.

Zhu, A., Li, K., Wu, T., Zhao, P., Zhou, W., & Hong, B. (2024). Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification. arXiv preprint arXiv:2404.14606.

Wang, Limin, et al. "Places205-vggnet models for scene recognition." arXiv preprint arXiv:1508.01667 (2015).

Targ, Sasha, Diogo Almeida, and Kevin Lyman. "Resnet in resnet: Generalizing residual architectures." arXiv preprint arXiv:1603.08029 (2016).

Chen, Jiacheng, et al. "Learning the best pooling strategy for visual semantic embedding." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

Barron, Jonathan T. "A general and adaptive robust loss function." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

Christoffersen, Peter, and Kris Jacobs. "The importance of the loss function in option valuation." Journal of Financial Economics 72.2 (2004): 291-318.

Spiring, Fred A. "The reflected normal loss function." Canadian journal of statistics 21.3 (1993): 321-330.

Vinyals, Oriol, et al. "Show and tell: Lessons learned from the 2015 mscoco image captioning challenge." IEEE transactions on pattern analysis and machine intelligence 39.4 (2016): 652-663.

Downloads

Published

30-04-2024

Issue

Section

Articles

How to Cite

Zhang, H., Diao, S., Yang, Y., Zhong, J., & Yan, Y. (2024). Multi-scale image recognition strategy based on convolutional neural network. Journal of Computing and Electronic Information Management, 12(3), 107-113. https://doi.org/10.54097/ro4puyx5