Research on Chinese Cuisine Image Recognition Using Semi-Supervised Convolutional Neural Networks

Authors

  • Yuyang Lei

DOI:

https://doi.org/10.54097/h678ja35

Keywords:

Chinese Food Image Recognition, Lightweight Networks, Self-Attention, Semi-supervised Learning

Abstract

With the rapid development of image recognition and deep learning, food image classification has achieved remarkable results. Chinese cuisine image recognition faces challenges due to diverse dishes and similar ingredients. This paper constructs a dataset containing both labeled and unlabeled Chinese food images. To address slow training and inference of traditional CNNs, we adopt lightweight networks. MobileNetV3-small reduces training time by 41.1% and improves inference speed by 21% compared to ResNet-34.To better capture correlated features, we propose MobileNetV3-small-sa with a self-attention mechanism, which improves accuracy by 2.2% over the base model. Given the high cost of labeling, we apply semi-supervised learning. MixMatch uses both labeled and unlabeled data. With only 1/7 labeled data plus 10,952 unlabeled images, it achieves 75.9% accuracy, while supervised learning on the full labeled set yields 80.1%. MixMatch improves accuracy by over 10% when labeled data is scarce. Experiments show that the self-attention model outperforms traditional networks, achieving 3.8% higher accuracy than VGGNet16 and 3.2% higher than ResNet34.

Downloads

Download data is not yet available.

References

[1] Chen, X., Zhu, Y., Zhou, H., et al. (2017). ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition. arXiv preprint arXiv:1705.02743. https://arxiv.org/abs/1705.02743.

[2] Nguyen, D. T., Zong, Z., Ogunbona, P. O., et al. (2014). Food image classification using local appearance and global structural information. Neurocomputing, 140, 242–251. https://doi.org/10.1016/j.neucom.2014.03.019.

[3] Jiang, S., Min, W., Liu, L., et al. (2020). Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, 29, 265–276. https://doi. org/10. 1109/TIP.2019.2932258.

[4] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/ 1409. 1556.

[5] Szegedy, C., Liu, W., Jia, Y., et al. (2015). Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9). IEEE. https:// doi.org/ 10.1109/CVPR.2015.7298594.

[6] He, K., Zhang, X., Ren, S., et al. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90.

[7] Huang, G., Liu, Z., Van Der Maaten, L., et al. (2016). Densely Connected Convolutional Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2261–2269). IEEE. https://doi.org/10.1109/CVPR.2016.247.

[8] Iandola, F., Han, S., Moskewicz, M. W., et al. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv preprint arXiv:1602.07360. https://arxiv.org/abs/1602.07360.

[9] Howard, A., Zhu, M., Chen, B., et al. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861. https:// arxiv.org/ abs/1704.04861.

[10] Howard, A., Sandler, M., Chen, B., et al. (2020). Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1314–1324). IEEE. https:// doi. org/ 10.1109/ICCV.2019.01412.

[11] Khan, S., Naseer, M., Hayat, M., et al. (2022). Transformers in Vision: A Survey. ACM Computing Surveys, 54(10s), 1–41. https://doi.org/10.1145/3505244.

[12] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is All you Need. Advances in Neural Information Processing Systems, 30. https://proceedings. neurips.cc/paper_ files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract. html.

[13] Berthelot, D., Carlini, N., Goodfellow, I., et al. (2019). MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv preprint arXiv:1905.00546. https://arxiv.org/ abs/ 1905. 00546.

[14] Zhang, H., Cisse, M., Dauphin, Y. N., et al. (2017). mixup: Beyond Empirical Risk Minimization. arXiv preprint arXiv:1710.09412. https: //arxiv.org/abs/1710.09412.

Downloads

Published

30-05-2026

Issue

Section

Articles

How to Cite

Lei, Y. (2026). Research on Chinese Cuisine Image Recognition Using Semi-Supervised Convolutional Neural Networks. Frontiers in Computing and Intelligent Systems, 16(2), 193-199. https://doi.org/10.54097/h678ja35