Subway Tunnel Crack Identification based on YOLOv5

: In view of the complex environment in the tunnel and the uneven lighting of the acquisition system, the lining images produced shadows and low contrast, a method of automatic color equalization combined with Laplacian pyramid (LP-ACE algorithm for short) was proposed in this paper. The computational complexity is reduced from the original O(N^4) to O (cid:4666)(cid:1840)(cid:1864)(cid:1867)(cid:1859) (cid:1840) ), which significantly reduces the amount of image computation and greatly improves the working efficiency. Due to the problems such as short time to identify skylights for cracks in key areas of subway tunnel, slow efficiency of manual method, inaccurate and difficult identification, an improved algorithm for key areas of power plant based on YOLO v5 was proposed: SD-YOLO. Ghost module is used to replace the traditional convolutional module to reduce the model parameters and improve the detection accuracy. The feature learning and feature extraction of crack region images are enhanced by the fusion of CBAM focus mechanism modules, while the influence of background on detection results is weakened. The bidirectional feature pyramid network is used for multi-scale feature fusion to reduce redundant calculation and improve the ability of the algorithm to detect small targets. The SD-YOLO algorithm proposed in this paper performs well in real samples, with an average accuracy of 93.1%, 11.3 percentage points higher than the original model, and significantly reduced parameters compared with the original model. Compared with YOLOv5s under the condition of reducing parameters, the model reasoning speed and detection accuracy are significantly improved by the proposed method, which can be effectively applied to tunnel detection.

With the development of China's urbanization process, the number of cars is increasing, and ground transportation is becoming more and more crowded.The subway can play a role in easing the pressure on ground transportation [1].Its convenience, comfort, safety and other advantages have become urban public transportation.An important part of the system and the preferred mode of urban transportation.As of January 2023, 29 urban rail transit lines are in operation in 54 cities, with a length of 9609.9 km [2].With the development of subway tunnels, the demand for tunnel detection technology is also increasing.During tunnel construction and operation, cracks, platform water accumulation, segment leakage, underground water pipe rupture and other diseases caused by deformation of underground structures [3] will bring great safety risks.Of these, cracks are the most common form of the disease.If not discovered and maintained in time, the cracks will further spread and expand, destroying the tunnel structure.It may cause heavy casualties and economic losses, so the detection of tunnel crack diseases is the top priority of tunnel inspection.
At present, in the actual tunnel disease detection process, most of them still use traditional manual detection methods.However, traditional manual inspection is time-consuming and inefficient, and the inspection effect depends on the work experience and familiarity of the staff [4].In addition, manual inspection requires a lot of manpower and time, and the time for subway skylight inspection is short.It is difficult for traditional methods to meet the current requirements for tunnel crack detection.In recent years, with the rapid development of big data and artificial intelligence, deep neural networks, as a discriminative structure algorithm, have excellent performance in image classification and target detection, and are suitable for scenarios with large amounts of data such as tunnel crack detection.Compared with traditional digital image processing technology, deep learning does not need to manually set the features to be extracted in advance.The network model can adaptively learn and extract image features, and the classification effect is stronger.
Li et al. [5]used the ResNet18 network to detect cracks, with an accuracy of 70%.Huang et al. [6] proposed an image recognition algorithm based on fully convolutional network (FCN) feature hierarchical extraction for semantic segmentation of cracks and leakage defects in subway shield tunnels.Kim and Cho [7] proposed a method to detect cracks using Mask R-CNN, relying on morphological operations to quantify cracks.Xu et al. [8] used Faster R-CNN to identify and locate various types of earthquake damage such as concrete cracks.Xue Yadong [9] proposed a deep learning method based on convolutional neural network to learn features of the target, improved the Google Net model, optimized the convolution kernel, and improved the inception module in the model to achieve automatic classification and identification of tunnel lining image diseases.Gao News [10] proposed a crack detection network based on densely connected convolutional network (Dense Net), which filters non-crack areas of the original image to reduce the interference of background information.At the same time, it uses a multi-dimensional classifier based on the segmentation algorithm.Eliminate incorrectly identified crack areas.Fu-Chen Chen et al. [11] proposed a deep learning framework based on convolutional neural network (CNN) and Naive Bayes data fusion scheme (NB-CNN) to analyze a single video frame rate for crack detection, while A new data fusion scheme is proposed to aggregate the information extracted from each video to enhance the overall performance and robustness of the system.Ren Song et al. [12] used the inception V2 network as a feature extraction network in the SS network to detect tunnel lining cracks and water leakage.Hui Yao [13]

LP-ACE Algorithm Enhances Images
Due to the complex environment in the tunnel and the uneven illumination of the acquisition system, the captured lining images have shadows and low contrast.This article uses an improved automatic color equalization algorithm to remove image shadows.The automatic color equalization (ACE) algorithm corrects the final pixel value by calculating the brightness and darkness of the image target pixel and surrounding pixels and their relationship., realize the contrast adjustment of the image, produce a balance of color constancy and brightness constancy similar to the human retina, and have a good image enhancement effect.
Then, for an image with N pixels, the amount of calculations will increase rapidly as the image size increases.ACE has high complexity and large amount of calculations, so the method of automatic color equalization combined with Laplacian pyramid (LP for short) is used.-ACE algorithm).The computational complexity is reduced from the original O(N^4) to O(Nlog N), which significantly reduces the amount of image calculations and greatly improves work efficiency.
The ACE algorithm includes two steps: first, adjust the color and spatial domain of the image, complete the chromatic aberration correction of the image, and obtain the spatial reconstructed image, such as Equation (1); second, dynamically expand the corrected image, such as Equation ( 2)).The operational complexity of the ACE algorithm is O(N^4).The calculation formula of ACE algorithm is as follows: In the formula, R_C is the difference between the pixel to be adjusted and the surrounding pixels, subset is the pixel subset participating in the operation, , is the brightness difference between the pixel to be adjusted and the surrounding pixels, d represents the distance measurement function, and r is the brightness performance function. 127.5 (2) In the formula, R_C is the difference between the pixel to be adjusted and the surrounding pixels, subset is the pixel subset participating in the operation, , is the brightness difference between the pixel to be adjusted and the surrounding pixels.d represents the distance measurement function, and r is the brightness performance function.
, 0 , Among them, N is the number of layers of the Laplacian pyramid, is the image of the lth layer of the pyramid, and is obtained by interpolation amplification of .
, 0 Each layer of sub-pyramid uses the improved ACE enhancement algorithm separately, so that the detailed features of the reconstructed image using a specific fusion model (such as Equation ( 5)) can be more obvious for different details on different decomposition layers.
∑ * ∈ (5) Among them, is the weight coefficient, indicating the proportion of the sub-image of the lth layer, and is the subpyramid image of the lth layer.
The output image of the LP-ACE algorithm requires N operations on the sub-layers of the pyramid, and the number of layers is log N. Comprehensive calculation, the complexity of the improved algorithm is O (Nlog N).In the same experimental environment, selecting the same picture with a pixel size of 640×640, the running time of the LP-ACE algorithm is 5.34ms, and the running time of the ACE algorithm is 26.12ms.Without affecting the details of the image, the running time of the LP-ACE algorithm is shortened by about five times.Enhance the contrast of the image, improve the lightness and darkness of the original image, while maintaining the authenticity of the image.

YOLOv5 Network Structure
As a single-stage detection algorithm, YOLO combines region proposal extraction with target recognition, significantly improving calculation speed.YOLOv5 successfully achieves the perfect balance of speed and accuracy by combining a large number of previous research techniques.The YOLOv5 model is small in size and fast in batch processing reasoning, which meets the requirements of crack detection.Yolov5 has four versions: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x.This article uses YOLOv5s (Figure 2) as the basic model for research.The structure of the rest of the models is basically the same, except for the model depth and model width.

Improved YOLOV5 Network Structure
In order to solve the problems of tunnels with many interference factors, small crack targets, cracks that are difficult to detect, and for long tunnels, the YOLO network model has a large amount of calculations, a fusion of convolutional block attention mechanism (CBAM) and bidirectional pyramid (BiFPN), and uses the ghost module to replace the traditional convolution improved YOLOV5S detection method, called the SD-YOLO model.The network structure is shown in Figure 3.The model has the following improvements: (1) Use the Ghost module to replace the traditional convolution module to eliminate redundant features, reduce the number of parameters, and obtain a more lightweight network model.
(2) Integrate CBAM into the backbone feature extraction network to increase the network's interest in foreground information and suppress irrelevant background information.
Enhance the feature extraction capabilities of the network.
(3) BiFPN feature fusion network is used to achieve bidirectional cross-scale feature fusion.It can improve the detection ability of the detection algorithm for small targets and greatly reduce the calculation amount of the model.In order to improve the calculation speed of the model, and in view of the fact that cracks are mainly detected by small targets, in order to extract more small target feature information, this paper selects all 3×3 size calculation kernels in the Ghost model, based on the framework of the Ghost model The Ghost Bottleneck module optimizes the YOLOv5s network structure.

CBAM Attention Mechanism Module
Since the tunnel environment is relatively complex, there are many interference factors.The positioning of cracks is inaccurate, and for long tunnels, there are many pictures and a large amount of calculation.In order to improve the accuracy of target detection and object classification, while reducing computational overhead and parameter size, the CBAM (Convolutional Block Attention Module) attention mechanism module is introduced Improve network performance.CBAM (Figure 5) consists of two modules, namely Channel Attention (CAM) and Spatial Attention Module (SAM).CAM (Figure 6) can make the network focus on the foreground of the image, making the network pay more attention to meaningful area, and SAM (Figure 7) allows the network to focus on locations rich in contextual information in the entire image.In this way, each branch can understand "what" and "where" to focus on the channel axis and the spatial axis respectively.Since CBAM is a lightweight general-purpose module, the overhead of this module can be ignored and it can be seamlessly integrated into the YOLOv5 architecture and can be trained together with it end-to-end.Therefore, the CBAM module effectively identifies and locates the location information of cracks by learning which information should be emphasized or suppressed.
The overall flow chart of CBAM is as follows.The input feature map passes through the channel attention mechanism, and the weight and the input feature map are multiplied before being sent to the spatial attention mechanism.The normalized weight and the input feature map of the spatial attention mechanism are multiplied to obtain the final feature map.

Bidirectional Pyramid Feature Fusion Network
With the continuous innovation of YOLO architecture, the network gradually deepens, the model becomes more and more complex, and the features extracted by the model become more and more complex, which will cause a certain feature loss.In the model, shallow networks have higher resolution and cover more accurate location information than deep networks.Deep networks have a larger receptive field than shallow networks and can cover more high-dimensional semantic information, which is beneficial to the classification of detection targets.Therefore, using better feature fusion methods to fuse feature information of different scales is very important to improve the performance of target detection models.
The YOLOv5s algorithm continues the PANet (Path Aggregation Network) in YOLOv4 and is used for feature fusion of the neck segment.PANet is based on the idea of FPN image feature pyramid.It not only performs feature fusion from top to bottom, but also increases feature fusion from bottom to top, thereby achieving the purpose of reducing information loss and achieving good detection results, but it increases the number of parameters in network training. .For tunnel crack detection, since the target is small at a certain distance, many small target objects appear, resulting in low model detection accuracy.BiFPN enhances the information extraction capability of the network and better combines lowlevel location information with high-level semantic information, thereby further improving the network's target detection performance.The PANet structure of the original network is only stacked on the channel, while BiFPN considers weight information and achieves bidirectional cross-scale feature fusion at the same time.It can not only improve the detection ability of the detection algorithm for small targets, but also greatly reduce the calculation amount of the model.It has great practical significance for equipment with low computing power in industrial fields, actual on-site deployment and subsequent model updates.Figure 9.

Dataset
The subway tunnel is photographed through a rail car equipped with an industrial camera.A large amount of data was collected in the form of pictures and videos.More than 100 pictures were obtained through data cleaning.Data enhancement and other methods were expanded to 3,000 tunnel crack images for the construction of the data set.Use a Python script file to randomly divide all images into training sets, verification sets, and test sets in proportions of 70%, 20%, and 10%.Some example data sets are shown in Figure10.

Training Parameter Settings
This section details the establishment of the data set and the selection of specific hyperparameters in network training.The training and testing tasks mentioned in this article are using Anaconda+ cuda11.4+cudnn11.4+pytorch1.11 on the computer.GPU (NVIDIA RTX3080TI 12GB) completed, for the training model, the training period (epochs) is 300, the mini-batch processing (batchsize) size is 4, we used Adam with a momentum of 0.937,

Evaluation Indicators
When testing the model, to evaluate the test results, select indicators such as precision (P), recall (R), and average precision (mAP).For tunnel crack detection, by grasping the existence of a specific tunnel, the safety of the tunnel is evaluated from the number, location, and severity of cracks.Regarding the output results of the target detection model, low false detection and low leakage detection are more conducive to tunnel disease detection, so focus on the accuracy and recall rate of the test model.where accuracy refers to the number of correctly detected samples for a class, accounting for the ratio of all predictions of other examples of that class.Based on the accuracy, the error of the model can be judged.The greater the detection level, the higher the accuracy and the smaller the false alarm rate.The recall rate refers to a certain ratio of the number of correctly detected samples to the actual number of samples in the category.Based on the recall rate, the degree of missed detection of the model can be judged.The greater the recall rate, the smaller the number of missed detections.mAP is a commonly used indicator to evaluate modeling capabilities, which can be good or bad.Adding the mAP evaluation indicator to the model improves the evaluation system and makes the results more comprehensive and accurate.

Parameter Comparison
This article uses the Ghost module to reduce the number of parameters, optimize the model, and achieve the purpose of speeding up.It can be seen from the comparison of the number of parameters above that the YOLOv5s network using the Ghost module has dropped by about 21.2 percentage points in the number of parameters, FLOPS has also dropped, the detection accuracy has increased by 3.2 percentage points, and the number of parameters and FLOPS have remained low.level, although the accuracy of YOLOv5x is slightly higher than the model in this article by 1%, its inference time and model weight are too large, making it unsuitable for actual use.Therefore, the proposed method can make the network more lightweight without affecting the detection accuracy.All models are tested on the test set to check the performance of the models trained in the previous section.The test results are shown in Figure 11.As can be seen from Figure 12, all models perform better than YOLOv5s.At the same time, comparing YOLOv5s with the detection method adding Ghost module, mAP increased by 2.2%.The model adding the Ghost module has advantages in detection accuracy.Comparing the results of YOLOv5s and adding the CBAM module and BiFPN detection method based on the Ghost module, the mAP increased by 0.6% and 5.5% respectively.Adding BiFPN significantly improves detection accuracy.Among them, the SD-YOLO model performs best.Compared with YOLOv5s, the mAP of the SD-YOLO model is increased by nearly 12%, which fully proves that the attention module improves the performance of the target detection model.

Conclusion
Image preprocessing uses the LP-ACE algorithm to significantly shorten the computing time.Enhance the contrast of the image, improve the lightness and darkness of the original image, while maintaining the authenticity of the image.An improved SD-YOLO tunnel crack detection algorithm based on YOLO v5s is proposed.By introducing the Ghost module, redundant features are eliminated, a more lightweight network model is obtained, the network reasoning speed is improved, and the CBAM attention mechanism module is added to reduce the complexity.The influence of background on detection results is strengthened to study the regional characteristics of crack images, so that the model can focus more on the extraction of crack features, and a two-way feature pyramid network is used for multi-scale feature fusion to reduce redundant calculations and improve the algorithm's detection ability of small targets.The effectiveness of the SD-YOLO model was proved through ablation experiments, and the average accuracy mAP value increased significantly.Comparative experiments have proven the superiority of the SD-YOLO model, which meets the accuracy and real-time requirements for crack detection.

Figure 3 .
Figure 3. SD-YOLO model2.3.1.Ghost ModuleIn convolutional neural networks, feature extraction often leads to redundant feature maps.Deeper networks use a large number of convolutional layers to stack, producing a large number of redundant feature maps.As the number of parameters increases, the consumption of calculations also increases.In order to reduce the amount of network calculations, the Ghost module divides the traditional

Figure 12 .
Figure 12.Ablation experimental data 3.4.3.Comparative Experiment SSD, Faster R-CNN, YOLOv3, YOL0v4 and YOLOv5 are commonly used for target detection.Under the data set, SSD, RetinaNet, Faster R-CNN, YOLOv3, YOLOv4 and YOL0v5 are compared with SD-YOLO.It can be seen from the comparative experimental data chart that the improved SD-YOLO network surpasses other detection methods in terms of accuracy.The detection mAP of the improved SD-YOLO

Table 1 .
Comparison table of model parameters