A Face Recognition Algorithm Based on Improved Resnet

: Regarding the problem that the increasing number of layers of CNN (convolutional neural network) leads to the decline of accuracy, an improved loss function algorithm based on the Resnet-50 model is proposed. The Softmax loss function lacks constraints on the distance within the same class and between different classes. Replacing the Softmax layer with improved Arcface loss enables the neural network to learn more distinguishing features. Experiments on LFW and AgeDB data sets show that the algorithm can not only learn deep-face characteristics but also efficiently improve the accuracy of face recognition compared with ordinary CNN. In the meantime, the improved Resnet also obtains a higher discerning rate under the conditions of occlusions, illumination, expression, Age.


Introduction
In recent decades, traditional face recognition methods are used to extract features and classify, such as LBP [1] and SVM [2].In the case of a small number of samples, this kind of method has a good result, but with the increase of face data, the traditional methods are far from meeting the requirements.With the vigorous development of computer vision and deep learning algorithms in recent years, especially the development of CNN, a powerful classification method commonly used to recognize and verify images [3], face recognition technology [4] has also developed rapidly and gradually stepped out to the life.Compared with the traditional face recognition methods, the face recognition algorithm based on deep learning has better recognition accuracy.Thus, it can be seen that deep learning plays a great role in face recognition [5].In view of the convenience, uniqueness, and non-repeatability of human faces, face recognition technology is widely used in many aspects of society, such as security, finance, scientific research, and so on.Face recognition has become the future development direction with many potential application prospects [6].Over the past five years, it has made a qualitative leap [7].Since the Resnet was put forward in 2015, more and more excellent algorithms based on Resnet have been proposed and good achievements have been made in the field of face recognition.Under the situation that it is difficult to further upgrade and optimize the network structure, researchers gradually turn their attention to the field of loss function and attention network [8].Starting from the improved loss function, this paper uses a residual network different from the original algorithm for face recognition.The experimental results show that the improved loss function increases the accuracy of face recognition.

Residual Principle
Resnet is a network model raised by The Kaiming [9] in 2015.Directly increasing the network depth to improve the accuracy will lead to two problems, vanishing gradient and exploding gradient problem and accuracy decline.The latter is not due to over-fitting.It is caused by saturation or even a decline in accuracy.For problem 1, Batch Norm [10] is adopted.The residual learning mechanism can be used to solve performance degradation problems caused by the increase of the alternating convolutional layer like problem 2. If H (x) is regarded as the desired actual mapping, that is, the stacked multi-layer nonlinear network is used to represent the fitting of the mapping relationship, then the multi-layer network can gradually approach a complex function.It is assumed to be equivalent to the approximation residual function F (x), where x refers to the input of the first layer and F(x) represents the residual function, then the actual mapping relationship can be expressed as: The advantage of using residual block is that there are no redundant parameters and the computational complexity will not be increased.The following figure is the most basic residual learning unit whose idea is to assume that there is identity mapping between models, namely, to solve the identity mapping function.Because it is difficult to obtain H(x) directly, the residual unit is used to pass through the lines of short-cut connections.Formula (1) can be realized by adding a feed-forward neural network of short-cut connections.In other words, the input of each layer is the superposition of mapping and input instead of the input mapping of a traditional neural network.

Batch Normalization
Batch Normalization, also known as the BN layer, is a data pre-processing method.After the input data is normalized, the output range is between 0-1.BN layer eliminates the differences, reduces the interference of useless data, and speeds up network convergence.

Attention mechanism
The SE module proposed by CVPR in 2017 only solved the shortcomings of traditional convolution from the perspective of channel, not from the perspective of space.In 2018, the author of CVPR proposed the sSE attention mechanism based on the SE attention mechanism to solve the shortcomings of convolution from the perspective of space.The difference between the two attention mechanisms is their different operations on dimensionality reduction.The SE attention mechanism uses global average pooling for dimensionality reduction, while the sSE attention mechanism uses conv 1*1 for dimensionality reduction.However, their similarity is that they all use the Sigmoid function activation to compress the weight between 0 and 1, which is convenient for multiplying and stacking with the original feature.

Softmax function
Sigmoid [13] function, also called Logistic function, is a binary classification problem.The logistic function fails to meet the requirement in the case of multi-classification of face images.Softmax [14] function is usually viewed as the last classifier in CNN for multi-classification tasks.Softmax loss function can ensure a good separability between classes.However, the within-class distance of features is scattered in a large range, in-class features are not compact enough, and the distance between some in-class features is even longer than that between classes.The function is expressed as follows: Generally speaking, it is poor in classifying similar faces.Wen Yandong [15], the author of this thesis, finds that there is still a large in-class distance in traditional Softmax through experiments.That is to say, by improving loss function, it can add constraints on in-class distance and improve network performance as well.

Improved Softmax Function
Because Softmax mainly considers whether the samples can be correctly classified, but lacks the distance limitation within and between classes, this paper raises an ArcFace [16].Arcface improves the recognition capability of the training model by reducing the in-class distance and increasing the inter-class distance.By observing the relationship between weight and class center, Sphereface [17], an epoch-making paper is put forward.Moreover, the important concept of angular margin is introduced.However, some approximate calculations needed will lead to training instability.The Arccosine function is used to calculate the angle between the current feature and the weight and an extra angular margin m is added to the target angle.To reduce the computational complexity, the offset j b =0, the inner product of weight and input features is as follows: . cos || |||| || Regularizing the weight and feature L2, namely, (4) Formula ( 4) enhances the in-class compactness and expands the inner-class differences almost simultaneously.Improved Softmax loss function pays attention to maximizing the classification boundary directly in the angular space.Combined with Cosface [18] and the usage of angular margin m in Arcface, this paper points out a modified loss function for Arcface loss.
The details are as follows: s: radius of hypersphere m: angle margin c: cosine distance.

𝑠((𝑐𝑜𝑠( 𝜃
Formula ( 5) is the final loss function and Formula ( 6) is the corresponding classification boundary.c is a cosine distance with a value of 0.3.Adding c can further compress the intraclass distance, and compressing the intra-class distance is equivalent to expanding the inter-class distance.If ( ( 1 + ) + ) >  2 , then the face input belongs to category 1, otherwise, it belongs to category 2.

Introduction to Data Set
LFW data set including a total of 13000 face images of about 5000 people.CASIA-WebFace [19] date set is a largescale face data set published in 2014.Face images collected from the Internet, including 494414 images of 10575 people, are used as the training set of Resnet in this experiment.There are 4000 face images of 100 people in the AR face database.It not only contains four basic expressions like nature, joy, anger, and surprise but also includes face images under various illumination conditions.Besides, it also contains partially occluded face images with sunglasses or scarves.Normal images and partially occluded images with sunglasses or scarves of 100 people are selected from the AR database.AgeDB [20] is a dataset of different ages, containing about 16,000 images of human faces ranging in age from 1 to 101, with an average age of 30 for each face.

Experimental Results
Based on the pytorch framework of deep learning, the experiment is carried out in the windows environment with RTX2060 GPU and 6GB RAM.The Batch size = 64, the learning rate = 1/e, and the optimizer adopts SGD.Accuracy:  = + +++ (7) The meanings represented by the symbols of formulas (7) are as follows: TP: True-Negative FN: True-Negative FP: False-Positive TN: True-Negative Face recognition algorithm Accuracy rate (%) Deepface [21] 97.35% Facenet [22] 98.87% Softmax+center Loss [23] 98.78% Arcface [16] 99.53% Table 2 shows the comparative experiments of this paper and the accuracy of the improved loss function algorithm on four data sets.By comparing the accuracy, it can be seen that the improved loss function slightly improves the accuracy compared to the Arcface loss function set, which proves the effectiveness of the improved loss function.

Conclusion
Since the lack of distance constraints within the same class and between different classes, Softmax loss fails to improve accuracy no matter how good the model is used in training.The algorithm proposed in this paper uses the improved Arcface loss function to solve the non-ideal classification effect of the Softmax layer based on Resnet and has achieved great results.A large number of experiments shall be conducted to set super-parameter m reasonably.As the angular margin m increases, the model is hard to be trained.Further study is needed on how to select m.Last but not least, improving the low accuracy of the AgeDB data set also needs to be solved.

+
represents the output of the fully connected layer.The value of increased to reduce Loss and all faces belonging to this kind of sample are included in this kind of decision boundary.Softmax mainly takes correct classification into consideration while lacking constraints on in-class and inter-class distances.


is the angle between j W and i x , therefore, the learned embedding features are distributed on the hyper-sphere with radius s.Since the embedding features are distributed around each feature center on the hyper-sphere, an extra m is added between the weight and feature.Arcface loss function is shown as:

Table 1 .
Accuracy Rate of various algorithms in LFW