Research on aluminum defect classification algorithm based on deep learning with attention mechanism

: Product quality is an important indicator for determining the quality of industrial products. Defects on the surface of aluminum profiles are inevitably caused in the actual production process due to the influence of various factors such as environment and equipment, and these defects seriously affect the quality of aluminum profiles. The focus and difficulty of research have shifted to how to quickly and accurately identify and classify surface defects in aluminum profiles. To address this issue, this paper proposes an aluminum defect classification algorithm that uses an attention mechanism in conjunction with the traditional Inception V4 network model for deep learning image classification, to accurately identify and classify aluminum defect areas. Experiments and comparative analysis are performed on the aluminum defect recognition dataset from the Alias Tianchi platform, and the results show that the algorithm with the addition of the attention mechanism improves accuracy by 1.24% over the original model.


Introduction
With the development of the industrial manufacturing field in the world, China has become a major manufacturing country with a large number of industrial products to be produced every day. With the increasing demand, the quality of industrial products is becoming more and more demanding, however, no matter the process of production or transportation, there is no way to avoid the defects on the surface of the products [1]. For example, due to the complexity of the production environment and the limitations of the processing equipment, the surface of aluminum profiles, as a basic material in industrial manufacturing, is prone to cracks, abrasions, peeling, pits, scratches, miscellaneous colors, dirty spots, and other defects during production and transportation, which can seriously affect the quality of aluminum profiles [2]. Therefore, it is important to automate the classification and identification of defects in aluminum profiles.
The existing defect recognition classification methods are the manual recognition method, single mechanism recognition method, infrared recognition method, magnetic particle recognition method, eddy current recognition method, magnetic leakage recognition method, machine vision recognition method [3][4][5][6][7], and other seven methods. However, the first six methods mentioned above have the shortcomings of low efficiency and accuracy due to the limitation of the principle. Among the methods of machine learning recognition, such as support vector machines and decision trees are based on manual to feature extraction and classification of defects, which can only learn some low-level features and cannot learn detailed abstract high-level features [8], and the accuracy rate is low in the recognition and classification of subtle defects.
With the development of deep learning, convolutional neural networks (CNNs) are widely used in image classification, target detection [9][10], speech recognition, and intelligent robotics [11], etc. Convolutional neural networks (CNNs) extract features differently from manual extraction, but learn features and extract them from the data itself, with better results and greater robustness. Shanshan Xu et al [12] implemented wood defect recognition classification by convolutional neural network and this method does not require complex pre-processing of images and can recognize many kinds of wood defects with high correct rate and efficiency. Liu, Meng Ke et al [13] achieved the classification of track surface defects recognition by the convolutional neural network, which solved the problem that traditional machine vision recognition technology relies on manual experience. Zhiyang Wu et al [14] proposed a convolutional neural network-based monochrome fabric defect recognition algorithm for the problem of high leakage rate and low efficiency of manual recognition of fabric defects in fabric production enterprises, and the experimental results showed that this method could achieve high accuracy and speed. tian Wang et al. [15] proposed a convolutional neural network to automatically extract features to distinguish between defectfree and defective images for product quality control. Gui Zhong Fu et al. [16] proposed a deep learning-based approach through a compact and effective convolutional neural network model that emphasizes the training of low-level features and incorporates multiple sensory fields.
However, due to the uneven distribution of surface defects of aluminum profiles and the fact that aluminum profile defects belong to small target information, the defect classification using only the above-mentioned deep convolutional neural network method is easy to ignore some detailed information, especially when some defects on the surface of aluminum profiles are small and easy to classify inaccurately, which leads to poor results.
Therefore, this paper focuses on the current practical problem of classifying aluminum surface defects by selecting the classical deep learning network classification model Inception V4 and adding the attention mechanism scSE module to its model for fusion, which makes the network model learning easier to capture the local key information on feature map channels and space and suppress irrelevant information, to further improve the accuracy of the model in classifying aluminum defects.

Inception V4 Network
For convolutional neural networks, an effective way to obtain efficient network performance is to increase the breadth and depth of the network, but the blind increase will lead to a dramatic increase in network parameters, which can easily cause overfitting [17].GoogLeNet [18] Inception V1 was proposed by the Google team in 2014 and won first place in that year's ImageNet competition classification task won the first place. He was the first to propose the Inception structure based on LeNet-5. The core of Inception is to use convolutional kernels of different sizes, which makes the existence of different sizes of perceptual fields, and finally, achieve the fusion of different scale features by stitching. The effect is to increase the depth and width of the network while also reducing the number of parameters. Different versions of Inception V2-V4 [19][20][21] have emerged based on this network with continuous optimization. In this paper, we use the Inception V4 network as the main architecture, and the network principle of Inception V4 is described in detail below. The network is composed of a Stem module, Inception-A module, Inception-B module, Inception-C module, Reduction-A module, and Reduction-B module. Each module is highly tunable. The width and depth of the network are increased without increasing the network parameters, thus increasing the accuracy and not overfitting. The network structure is shown in Figure 1.

SCSE Attention Mechanism
The attention mechanism mimics the study of human brain vision, which has access to a large amount of information from the outside world at anytime and anywhere, and from a large amount of information, quickly locates relatively important information and ignores irrelevant information [22]. Therefore, it is important to introduce the attention mechanism into deep learning networks to achieve similar functions in defect recognition classification.
The scSE module is based on the evolution of the SE module in SENet [23]. three variants of the SE module of ROYAG et al [24], namely, the cSE channel compression spatial excitation module, the sSE spatial compression channel excitation module, and the scSE module formed by combining the cSE and sSE modules in parallel. And it is experimentally demonstrated that such a module can enhance meaningful features and suppress useless features.
The main idea proposed by the cSE module is shown in Figure 2, Firstly the input feature mapU=[u1, u2,...,uc], each channel ui( ∈ ℝ × ) ， U obtains the vector z( ∈ × )after passing through the global pooling layer (GAP).The value at the k-th channel can be expressed as: where H, W is the size of the feature map,C is the number of channels, (i,j) is the coordinate on the feature map.Then the vector is fully connected twice, W1,W2 are the weights of the fully connected layer, and then the process of Relu function enhances the independence between each channel and Sigmoid normalization to obtain ( ∧ ),where the ∧ value can be expressed as: ∧ = 1 ( ( 2 )) (2) Finally, it is multiplied with the unprocessed feature information of the original channel to obtain the calibrated feature map. In this way, the information within the unimportant channels will be reduced and suppressed, while the information within the important channels will remain almost unchanged and the disguised phase is enhanced. The whole process can be expressed as: (2) The main idea of sSE module is shown in Figure 3, sSE is a variant on the basis of cSE, cSE to improve the ability of important channel feature information of the network by compressing spatial information, then, in turn, it can also compress channel information to improve the ability of important spatial feature information of the network, so sSE was born.
For the input feature map U=[u1 ,1 ,u1 ,2 ,..., ui, j ,..., uH ,W] (ui, j∈ ℝ 1×1× )，H,W are the dimensions of the feature map respectively,(i,j) is the spatial location of the feature map, the vector q is obtained by squeezing the feature information of the space through a 1×1 convolution with a channel number of 1. The vector q can be expressed as: q= sq * (4) Where sq ∈ ℝ 1 * 1 * c*1 , The obtained feature map q is then normalized by the sigmod function to obtain( (·)), which corresponds to the spatial feature information of the (i,j) pixel points in the feature map respectively, and finally weighted with the input original feature map.The whole process can be expressed as:   Figure 4, which is a tandem of the cSE and sSE modules, and the feature extraction is carried out separately by both the channel and space of the feature map, and then the weighted summation is obtained after the feature map information obtained is more specific and targeted, making the final extracted feature information also more focused. Its formula can be expressed as:

Method
The difficulty of classifying aluminum defects lies in the inconsistent shape of the surface defects and the difference in size. For the surface defects of aluminum are very small and some small scratches, this paper incorporates the attention mechanism scSE module in the module of Inception V4 network model to pay more attention to these regions of interest, extract more important feature information for learning and then improve the accuracy of the model classification. The structure diagram of the improved algorithm is shown in Figure 5. Inception V4 investigates two modules, the Inception Module and Reduction Module, the attention module scSE is added to Inception-A, Inception-B, and Inception-C modules respectively for fusion, but the number of parameters thus generated does not affect the network training time and the real-time demand for classification of aluminum defects. The Inception Module is added to increase the width and depth of the network to obtain more feature information, and the Reduction Module is added to reduce the computational effort. The Reduction Module was added to reduce the computational effort.
The algorithm in this paper uses the cross-entropy loss function commonly used in classification problems in the network training process, Its formula can be expressed as:

Datasets
This paper uses a dataset from the 2018 Guangdong Industrial Smart Manufacturing Big Data Innovation Competition -Aluminum Profile Surface Defect Recognition dataset, which is hosted by AliCloud Tianchi. The dataset contains no defect samples and ten categories of defect samples (divided into non-conductive, scuffed, cross-strip pressure dent, orange peel, bottom leakage, bruise, pit, convex powder, cracked coating, and dirty spot). The following are images of the ten categories of defect samples, respectively, as shown in Figure 6.

Evaluation Criteria
The experiments in this paper belong to a classification task, so the common criteria of classification, Accuracy, Precision and Recall, are used to evaluate the performance of the algorithm. First of all, the concepts of TP, TN, FP, and FN are concerned. TP is predicted as a positive sample and predicted correctly, TN is predicted as a negative sample and predicted correctly, FP is classified (predicted) as the actual negative sample as a positive sample, and FN is classified (predicted) as the actual positive sample as a negative sample. 1)Accuracy Accuracy is the most commonly used evaluation metric in classification, and it is the proportion of all correctly predicted samples (both positive and negative classes) to the total.

Results and Analysis
This experiment is based on Windows operating system, trained on Pytorch deep learning framework, using GeForce GTX1650Ti GPU. the size of all images in the experiment is set to 400×400, the epoch in the experiment is set to 50, the stochastic gradient descent algorithm is used, and the initial learning rate is 0.001.
To verify the performance of the algorithm proposed in this paper, the improved algorithm Inception V4-scSE is compared and analyzed with the network model InceptionV4 and the two most commonly used deep learning network models Xception [25] and shufflenet-v2 [26] on the aluminum defect recognition dataset by these three criterias accuracy, precision and recall. As shown in Figure 7 and Figure 8, the curves of loss function and accuracy of the proposed algorithm model in this paper are compared with other network models at the time of validation. From the graphs, it can be seen that the lowest loss value of the proposed algorithm Inception V4-scSE is around 0.1, and the accuracy rate is around 98%. The next lowest loss value of Inception is around 0.15, and the accuracy is around 96%. It can also be seen from the figure that the loss values and accuracy rates of these four algorithms gradually level off after the 25th epoch. As can be seen from Table 2, the Inception V4-scSE network model proposed in this paper has 1.24% higher accuracy, 5.8% higher precision, and 1.6% higher recall than the original model Inception v4. And it is also compared with other deep learning network models Xception and Shufflenet-V2, and both the accuracy, precision and recall rates are significantly better than the performance of these network models.

Conclusion
In this paper, we propose a deep learning aluminum defect classification algorithm incorporating an attention mechanism to address the practical problem of identifying and classifying defects generated by aluminum in the production process and various factors. The accuracy of classifying aluminum defects is improved by adding an attention mechanism module to the deep neural network model Inception V4 to enhance the sensitivity of the network to small defects. Experiments are also conducted on the aluminum defect recognition dataset from Ali Tianchi to However, this algorithm only classifies a single defect on an image, and in reality, there are often multiple defects on a single image, so in-depth research is needed for this multilabel defect classification task in the future, making it possible to truly solve the practical problems that exist in reality and improve the practicality and robustness of the algorithm.