CoT-Integrated Lightweight Cross-Modal Model for Scrap Steel Recognition

Sitong Liu; Ruijie Xu; Fanhui Kong; Yuchen Xiao; Yingchao Liu

doi:10.54097/pyrb6374

Authors

Sitong Liu
Ruijie Xu
Fanhui Kong
Yuchen Xiao
Yingchao Liu

DOI:

https://doi.org/10.54097/pyrb6374

Keywords:

Cross-Modal Learning, Knowledge Distillation, Chain-of-Thought, Model Lightweighting

Abstract

Under the strategic goals of "Dual Carbon," the green transformation of the iron and steel industry hinges on the efficient recycling of scrap steel resources. To address the limitations of existing scrap steel recognition methods—such as low efficiency, reliance on manual experience, insufficient unimodal analysis capability, and deployment challenges—this study proposes a novel cross-modal recognition large model that integrates chain-of-thought (CoT) reasoning and lightweight knowledge distillation. Firstly, a cross-modal recognition framework based on CLIP and SAM is constructed to establish a "shape-image-composition" semantic mapping, enabling fine-grained segmentation and compositional association of scrap steel. Secondly, a model compression strategy incorporating multi-dimensional knowledge transfer and chain-of-thought guidance is designed. This strategy effectively adapts the capabilities of the large model for edge computing devices while preserving high accuracy and ensuring the interpretability of the decision-making process. Finally, an intelligent decision-making closed-loop system of "composition prediction - charge optimization - process calibration" is developed by integrating the aforementioned model. Experimental results demonstrate that the optimized lightweight model achieves an inference speed of 35 FPS on edge devices with a mean Average Precision of 92.1%. System-level simulation shows an 8.2 percentage point increase in scrap steel utilization rate and a significant enhancement in process stability. This research provides a high-precision, high-real-time, and highly reliable solution for the intelligent upgrading of short-process steelmaking.

Downloads

Download data is not yet available.

References

[1] IPCC. Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 2022.

[2] Jocher, G., Chaurasia, A., Stoken, A., et al. Ultralytics YOLOv5: A New State-of-the-Art in Real-Time Object Detection. arXiv preprint arXiv:2209.02676, 2022.

[3] Wang, H., Zhang, Z., & Liu, S. A Survey of Two-Stage Object Detection: Advances and Challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10234-10256.

[4] Radford, A., Kim, J. W., Hallacy, C., et al. Learning Transferable Visual Models From Natural Language Supervision. International Conference on Machine Learning (ICML), 2021: 8748-8763.

[5] Kirillov, A., Mintun, E., Ravi, N., et al. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023: 4015-4026.

[6] Gou, J., Yu, B., Maybank, S. J., & Tao, D. Knowledge Distillation: A Survey. International Journal of Computer Vision, 2021, 129(6): 1789-1819.

[7] Chefer, H., Gur, S., & Wolf, L. Transformer Interpretability Beyond Attention Visualization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 782-791.

[8] Deb, K., & Deb, D. A Review of Evolutionary Multi-Objective Optimization: 20 Years of NSGA-II. IEEE Transactions on Evolutionary Computation, 2022, 26(5): 911-931.

[9] Zhang, Y., Wang, S., & Ji, G. A Deep Learning-Based Recommendation System for Dynamic Process Parameter Optimization in Industrial Production. Journal of Manufacturing Systems, 2023, 68: 1-12.

[10] Shorten, C., & Khoshgoftaar, T. M. A continued survey of image data augmentation for deep learning. Pattern Recognition Letters, 2022, 161: 8-14.

[11] Zhang, H., Cisse, M., & Dauphin, Y. N. Advances in mixup and its variants for deep learning. Neurocomputing, 2023, 555: 126635.

[12] Ge, Z., Wang, X., & Li, Z. Robust Object Detection in Adverse Weather and Industrial Environments: A Benchmark and Simulator. IEEE Robotics and Automation Letters, 2022, 7(4): 11128-11135.

[13] Liu, S., Zhang, W., Wang, L., et al. SSD-10K: A Large-Scale, Multi-Category Scrap Steel Dataset for Visual Recognition in Industrial Environments. IEEE Transactions on Industrial Informatics, 2023, 19(5): 6789-6800.

[14] Zhou, K., Yang, J., Loy, C. C., & Liu, Z. Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 2022, 130(9): 2337-2348.

[15] Wei, J., Wang, X., Schuurmans, D., et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS), 2022, 35: 24824-24837.

[16] Zhang, Q., & Li, H. Dynamic Multi-Objective Optimization for Sustainable Production Scheduling: Models and Algorithms. Journal of Cleaner Production, 2022, 380: 135037.