Improved YOLOv5l ‐ based Detection of Surface Defects in Hot Rolled Steel Strips

: To address the problems of complex background, different sizes and easy to miss and mis-detect in the detection of surface defects in hot-rolled strip, an improved YOLOv5l-based method for detecting surface defects in hot-rolled strip is proposed. Firstly, by adding the SimAM attention mechanism module to the aggregation network, the important information is focused with high weights to improve the recall rate of the original algorithm; secondly, by replacing all C3 modules in the YOLOv5l structure with C2F, a richer gradient of information flow is obtained to improve the accuracy rate of the original algorithm. The experimental results show that the average detection accuracy using the improved YOLOv5l improves by 5.3% and the accuracy rate by 8.3% compared to the original network, resulting in higher detection accuracy and lower error and miss detection rates, meeting the requirements of hot-rolled strip steel inspection in industrial manufacturing.


Introduction
Hot-rolled strip [1] is an economical 'green steel' commonly used in industry. Hot-rolling is rolling above the recrystallisation temperature and is widely used in mechanical applications because of its low energy consumption, low cost, good vibration resistance and high production efficiency. In actual manufacturing, the hot rolling process will produce a variety of different defects on the surface of the strip, the common ones are Crazing, Rolled-in Scale, Scratches, Inclusion, Patches, Pitted Surface, six kinds of surface defects These defects have a serious impact on the qualification rate of hot rolled products [2]. Traditional strip inspection methods include manual sampling, magnetic fluxleakage testing, eddy current testing and infrared detection [3]. Manual sampling method refers to the use of the naked eye to distinguish defects, the method not only wastes manpower but also has the problems of leakage and low accuracy; magnetic fluxleakage testing [4] uses magnetic sensors to detect surface defects, but cannot detect closed cracks and limit the types of defects; eddy current detection testing [5] uses the principle of electromagnetic induction to detect metal surface defects, the method requires professional analysis and judgement, customized solutions and high detection costs; infrared detection method [6] uses the surface temperature of defective materials to detect defects, but the detection sensitivity is related to thermal emissivity, affected by time, temperature, location and size, and cannot accurately distinguish the types of defects.
With the rapid development of computer vision and deep learning [7], target detection algorithms based on deep neural networks [8] are widely used in defect detection. At this stage, target detection algorithms [9] can be divided into two categories according to the existence of candidate regions: one is the two-stage target detection algorithm represented by RCNN [10], SPPNeT [11], Fast RCNN [12], Faster RCNN [13]; the other is the single-stage target detection algorithm [14] represented by SSD [15], YOLO series [16], RetinaNet [17].
At present, the development of deep learning-based surface defect detection technology for hot-rolled strip steel has advanced rapidly, and the method not only improves the detection accuracy and precision, but also saves a lot of labor costs, and is widely used in practical production in. An improved YOLOv3 algorithm model is proposed in the literature [18], using a weighted K-means clustering algorithm to improve the matching graph of the a priori frame and the feature layer, which improves the inspection accuracy of the algorithm. Wang Daolei et al [19] proposed an improved algorithm based on YOLOv4-tiny, which combined multi-scale detection and attention mechanism to improve lightweight target detection accuracy. For the problems of small size of strip steel surface defects, fuzzy features and easy to miss detection. Zhou Jinwei et al [20] proposed an improved algorithm based on YOLOv5 by designing a new feature extraction module and modifying the confidence loss function to improve the stability of the algorithm convergence. Liu Jinchuan et al [21] added a small target detection layer to address the problem of small target miss and error detection; and introduced the Transformer encoder block module and the Convolutional block attention model (CBAM) attention mechanism module to address the problems of image crossover and overlap, improving the detection capability of the algorithm in complex backgrounds. Pan Meng et al [22] introduced 1x1 convolutional side branches by reconstructing convolution to improve the feature extraction capability of the network; added an attention mechanism with channels to retain more spatial information; and switched to a weighted bidirectional feature pyramid network to improve small target detection. Wang Bo et al [23] enhanced the fusion of image information by combining the Transformer layer with the BiFPN network structure; replaced the convolutional layer in the backbone network with a lightweight network, RepVGG, to enhance the feature extraction capability of the backbone network; and added a prediction layer to improve the multiscale target detection capability.
To address the problems of varying size and uneven distribution of strip surface defects, multiple types of defects and complex backgrounds, this paper proposes an improved YOLOv5l-based algorithm for detecting strip surface defects, which improves the detection accuracy of the model while satisfying high detection accuracy and basically unchanged number of parameters and computational complexity, adds the SimAM attention mechanism module at the Head end and replaces the C3 module with the C2F module to improve the detection performance of the algorithm and meet the needs of industrial deployment.

Introduction to the YOLOv5l Algorithm
The YOLO series of algorithms are deep learning-based regression methods that use only a single convolutional neural network (CNN) network to directly predict the class and location of different targets. YOLOv5 is an improvement on the network structure of YOLOv4, with four models of target detection networks, YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x, based on the network structure of YOLOv5l version 6.0 is divided into four modules: Input, Backbone, Neck and Head, which improves the detection accuracy and learning speed compared with YOLOv4 [24]. The specific structure of the network in version YOLOv5l 6.0 is shown in Figure 1.

Input
Compared with Yolov4, Yolov5l uses Mosaic data augmentation on the input side of its network structure to randomly scale, crop and arrange the dataset, thus improving the small target detection accuracy; secondly, it adds adaptive anchor frames to calculate the best anchor frame values for different training sets; finally, it uses adaptive image scaling to adaptively add the least black edges to the original image to reduce information redundancy, reduce computational effort and improve inference speed.

Backbone
YOLOv5l version 6.0 of Backbone is mainly divided into Conv module, CSPDarkNet53 and SPPF module. Among them, the Conv module replaces Focus in the old version to improve model efficiency while facilitating model export; CSPNet reduces computation, improves inference speed and obtains richer gradient combination information by segmenting the gradient flow, while ensuring no degradation in model detection and recognition accuracy; The SPFF module uses multiple small size pooling nucleus cascade instead of a single large size pooling nucleus in the SPP module, which improves the operation speed by merging the feature map of different receptive fields and enriching the expression ability of the feature map.

Neck
The Yolov4 and Yolov5l Neck modules both use an FPN+PAN structure to perform multi-scale feature fusion of strip surface defects by FPN and PAN (see Figure 2)

Head
Yolov5l Head contains three detection layers, corresponding to the three different sizes of feature maps obtained in Neck, and three anchors with different aspect ratios are preset for the grid divided on each feature map to predict and regress targets; CIoU_Loss is used as the loss function of Bounding box, and for the screening of multitarget boxes, a weighted nms is used on the basis of DIoU_Loss to enhance the detection accuracy of occluded overlapping targets.

Introduction of the SimAM Attention Mechanism
Module Adding an attention mechanism can effectively enhance the model's ability to extract features from images. The SimAM attention mechanism module [25] is a simple and very effective attention module for convolutional neural networks based on neuroscience theory. Unlike existing channel or null-field attention modules, this module derives 3D attention weights in the network layers without adding any parameters, and a schematic of 3D attention weight assignment is shown in Figure 3. The module aims to find important neurons by optimizing the energy function, using a linearly branchable metric between neurons. The energy function as defined by each neuron is: x =y 0 for all other neurons. The minimization formula is equivalent to finding the linear differentiability of the target neuron t and other neurons within the same channel. Using binary labels and adding regular terms, the final energy function is: Where λ is the regularization factor; w i is the weight of the ith neuron when transformed; from Eq.(4) it can be inferred that other neurons in the same channel satisfy the same distribution, so the mean and variance of all neurons can be calculated, replacing μ t and 2 t  is the mean and variance of all neurons in the corresponding channel after removing neuron t, and all neurons on the same channel are multiplexed with this mean and variance, reducing the computational complexity of each location, the lower the energy, the greater the difference between neuron t and the surrounding neurons, and ultimately the minimum energy e t * at each location is calculated as follows:

Introduction of the C2F Module
A C3 module containing three standard convolutional layers (Conv+BN+SiLU) and n Bottleneck modules was designed in YOLOv5l with the help of the idea of CSPNet to extract the divergence and residual structure, which is the main module for learning on residual features, with two types of structure, one using multiple Bottleneck stacks and three standard convolutional layers; The other class uses only one basic convolution module, and the two classes are combined for concat operations. In this paper, the YOLOv5l model is improved by replacing the C3 module with the C2F module, so that the improved model can obtain richer gradient flow information and improve the detection accuracy of the model while ensuring its light weight. The C3 and C2F module structure pairs are shown in Figure 4.

Experimental Environment Setup
The hardware environment for the experiments is Windows 10, the CPU is Intel Core (TM) i7-9700K, the memory is 32GB, the GPU is NVIDIA GeForce RTX2080 Ti, and the software environment is Pytorch (1.

Dataset
This dataset was obtained from the public tape steel NEU-DET file of Northeastern University, which contains 6 different types of Crazing, Rolled-in Scale, Scratches, Inclusion, Patches and Pitted Surface defects, with an image size of 200x200 and a total of 1800 grey-scale images. After screening the data set in this paper, 1400 images were selected randomly according to the ratio of training set: validation set: test set 6:2:2. An example of the dataset is shown in Figure 5.

Parameter Setting and Evaluation Index
In this paper, we set epoch=200, batch size=4, conf_thres=0.5, initial learning rate is 0.01, learning rate momentum is 0.937, weight decay coefficient is 0.005, SGD algorithm is used for training, and precision rate P (Precision), recall rate R (Recall) and average precision value mAP (Mean Average Precision) as the model performance index. The formula is as follows.  (False Negative) indicates a positive sample with negative prediction; AP is the average precision of a single target category; mAP is the average precision value of AP for all categories; the P-R curve generated by the improved YOLOv5l is shown in Figure 6, where P is the vertical coordinate, R is the horizontal coordinate, and the area enclosed by the P-R curve and the coordinate axis is AP. Figure 6. Analysis of the results of the improved P-R curves

Analysis of Experimental Results
In this paper, the improved YOLOv5l model is compared with the original YOLOv5l model, and the improved before and after models are trained on the same dataset for epoch times respectively, and the comprehensive evaluation results are shown in Table 1. The experimental results show that the detection accuracy is significantly improved after the introduction of C2F and SimAM attention mechanism in the original YOLOv5l network. This paper is based on the improved YOLOv5l model can detect strip surface defects more effectively compared to the original YOLOv5l, and the comparison graph of detection effect is shown in Figure 7.

Summary
In the process of detecting defects on the surface of hot rolled strip steel, due to the problems of irregular target shape, different scales, complex background and easy false detection and omission, this paper designs a new detection algorithm based on the YOLOv5l algorithm. Firstly, the SimAM attention mechanism module is added to the Head side to improve the recall of the original algorithm without affecting the model parameters and computational complexity; secondly, all C3 modules in Backbone and Head are replaced with C2F modules in the original YOLOv5l model, on the basis of ensuring its lightweight, the improved model obtains more abundant gradient flow information and improves the identification accuracy of the model on the defects; Finally, the experimental results on the fused SimAM and C2F module on the NEU-DET dataset show that this model achieves 5.3% improvement in the accuracy of detecting strip surface defects compared to conventional neural networks.