Research on group distribution leakage analysis method for RSM

: Side channel attack is a new and simple attack idea, and its proposal has created a significant threat to the security of cryptographic devices. With the continuous development of deep learning techniques, deep learning-based side channel attacks have enabled the attack techniques to reach new heights, but the attack efficiency and training time have become new problems. In order to improve the attack efficiency, researchers have proposed a large number of leakage analysis methods, which can effectively extract the intervals with significant leakage in the energy consumption curve and reduce the training parameters and data volume. In order to deal with side channel attacks, researchers have proposed various resistance strategies, among which masking protection is the simplest and widely used way. RSM rotating S-box masking scheme is a particularly important masking protection scheme, which uses a first-order masking scheme to randomize the leakage energy consumption of sensitive information using random masks, thus making the algorithm resistant to first-order side channel attacks. For this type of masking algorithm, this paper proposes a new leakage analysis algorithm, namely "Group Distribution Difference" (GDD). This algorithm is based on the distribution difference of energy consumption, and the energy consumption frequency in the group is calculated instead of the probability distribution. The KL distance is used to calculate the difference between groups and find out the leakage interval, so as to achieve an efficient attack.


Introduction
In 1996, Kocher [1] first introduced the concept of sidechannel attacks, and then a large number of studies on sidechannel attacks against cryptographic devices followed.In 1999, Kocher [2] and others first introduced the concept of DPA attack, which exploits the connection between the energy change generated by a cryptographic device during the encryption operation and its corresponding intermediate value to restore the key.In 2003, Chari [3] et al. proposed the template attack (TA).This attack first mathematically models the power consumption curves of all keys completing the encryption process to obtain the energy consumption templates generated by all elements performing encryption, and then performs template matching on the energy consumption curves collected by the device to be attacked to find the most suspicious key based on statistical principles.Compared with the non-modeling class of side channel attacks, the template attack has a more accurate portrayal of the leakage and a more efficient attack, but its huge computation is still a problem.In 2015, Gilmore [4] proposed a neural network-based template attack that accelerates the construction of templates and makes the construction of templates more accurate.
With the continuous development of side-channel attacks, the security of cryptographic devices has been greatly tested, and a series of countermeasures have been proposed to resist side-channel attacks, of which masking technique is one of the most effective and widely used.In 1999, Chari [5] et al. proposed an S-box mask protection scheme for power analysis attacks.In 2001, Akkar [6] et al. proposed a multiplication-based masking scheme to add a multiplicative mask to the S-box inverse of the AES algorithm.In the same year, Itoh [7] et al. proposed a fixed-value mask scheme with the same random mask for each round of encryption, but the scheme is not resistant to higher-sorder attacks.In 2012, Nassar [8] et al. proposed the RSM (Rotating S-box Mask) low-entropy mask scheme, which is the focus of this paper, and a more detailed introduction can be found in Section 2.1.
At present, the attacks generally used against cryptographic algorithms implemented by add-masking are higher-order DPA attacks.After kocher et al. first proposed the concept of DPA, in 2000, for the masked cryptographic algorithm, Messages [9] refined the definition of nth-order DPA from various aspects and first proposed the concept of second-order DPA and its implementation method, taking the masked DES algorithm as an example, and confirmed the feasibility of the method through experiments.Based on these theories, more practical schemes and optimization methods have been proposed for masking strategies.2004, Jason [10] et al. proposed their attack model and corresponding algorithm for second-order DPA, which can achieve better results when the energy traces are short and the correlation coefficients are large.2013, Belgarric [11] et al. proposed a preprocessing method for time-frequency analysis of energy traces, which can effectively improve the attack efficiency.The method can effectively improve the attack efficiency and avoid the preprocessing of cross-combination of leaked energy sources.The method is applicable to soft encryption implementations with first-order protection.
Template attack is an energy analysis attack with learning that is also used to attack cryptographic algorithms with masking.Template attack uses known information about the training device to build an accurate noise model of the leaked information in the learning phase, thus greatly improving the efficiency of the attack.in 2006, Oswald et al [12][13] proposed the first template-based DPA attack by combining the template attack.in 2007, Lemke-Rust et al [14] proposed to use the template attack directly to mask method for encryption algorithms.This method uses a training device to learn a comprehensive template about the leaked intermediate value and the mask, and attacks both the mask and the key of the device during the attack.In 2015, Lerman et al [15] proposed to use support vector machines to build templates about the mask and the intermediate value after demasking.In the attack using the template of the mask first attack the mask and after the demasking then attack the key.In the same year, Gilmore et al [16] proposed the same idea of attack but with a template using a neural network.The methods proposed in these studies all require the attacker to learn the random mask used by the device in each encryption during the learning phase this requirement is very demanding and is not available in all attacks.

Common Leakage Analysis Techniques
One of the safety-critical devices that need to be certified before going to market is the test side channel attack.This attack exploits unintentional leakage of the device, such as power consumption, electromagnetic radiation or timing, to pose a threat to the physical implementation.The certification process is expensive and time-consuming, especially when the desired level of security is increased.For example, the ISO draft standard 17825 requires resistance to side-channel analysis with 10,000 traces (level 3) and 100,000 traces (level 4).To speed up the evaluation process, a method should be used to compress a large number of traces into a small set of correlation points.
Before the template attack, feature extraction of the target energy traces is required to select the points with higher correlation for the attack, which helps to improve the efficiency of template construction as well as the accuracy of the attack.Seven main feature selection methods are described below.

SOD (Sum of Pairwise Differences)
Chair [3] proposed the SOD method in 2003 as a basic feature point selection method, which is simpler to calculate.The calculation of the SOD method is shown in Equation ( 1): t denotes the energy trace curve in the data set, N is the number of energy traces (not described subsequently), and s is the sum of the difference curves.The principle of this method is to obtain the difference curve by making a difference between two of the collected energy traces.Then, the data corresponding to the same moments on the difference curve are summed up, and the moments corresponding to the points with larger values are the locations of the feature points with higher correlation to be extracted.However, this scheme has a problem that there are positive and negative differences in the collected energy traces, which may cause some feature points to be ignored in the calculation and thus have a great impact on the selection of feature points.

SOSD (Sum of Squar-ed Pairwise Differences)
Christian et al. proposed an improved method called SOSD for the inability of SOD to better handle the positive and negative energy traces, i.e., the square operation is added to the SOD method, and the calculation is shown in Equation ( 2 As in the first method, the moment of the larger value point is selected as the location for feature point selection.

SOST (Sum of Squar-ed Pairwise T-Differences)
Gierlichs [17] et al. proposed a t-test-based feature selection method SOST, which optimized the SOSD method.The details of the calculation are shown in Eqs. ( 3) and (4): Equation ( 3) is the formula of T-test, and Equation ( 4) is the formula of SOST's feature point extraction, which is the combination of T-test and SOSD.m denotes the corresponding energy trace curve mean vector, and N denotes the number of corresponding curves.When the number of samples is small, the relative SOSD is more resistant to noise, and it is easier to find out the feature points with obvious leakage.

Pearson correlation coefficient
The Pearson correlation coefficient method can be used when the data are linearly correlated with each other and the data obey normal distribution, and the selection area of feature points can be determined by calculating the location of the points with higher correlation.The calculation is shown in Equation (5).
Where t denotes the vector composed of the corresponding points of i at a certain moment of the energy trace curve and v denotes the corresponding intermediate value.It can only represent the degree of correlation of the data and cannot be used for direct causal determination.

PCA (Principal Component Analysis)
Principal component analysis (PCA) is a common leakage analysis technique that is typically used to process large amounts of data to find the main patterns and features in the data.In side-channel attacks, PCA can be used to analyze the main features in energy consumption leaked data to infer the secret key.The basic idea of PCA is to transform highdimensional data into low-dimensional data while retaining most of the information of the data.PCA obtains the principal components of the data and the corresponding eigenvalues by calculating the covariance matrix of the data and then decomposing the covariance matrix into eigenvalues.Principal components are linear combinations of the data that explain most of the variance in the data.By using principal components, the data can be mapped from a high-dimensional space to a low-dimensional space, while retaining most of the information of the data.

SNR (Signal-To-Noise Ratio)
Signal-to-noise ratio (SNR) is a common leakage analysis technique used to analyze the ratio between signal and noise in energy consumption leakage data.In side-channel attacks, SNR can be used to evaluate the quality of energy consumption leakage data and extract from it the points with high signal-to-noise ratios as the characteristic points where leakage is evident.The calculation is shown in Equation ( 6).

NICV(Normalized Inter-Class
Variance) Shivam Bhasin [18] et al. in 2014 proposed a better analysis technique namely regularized inter-group variance NICV.A key advantage of NICV is that neither cloning of devices nor obtaining key information of encrypted devices in advance is required.As a univariate ANOVA F-test, NICV is relatively low computational effort, using only publicly available data such as plaintext input or ciphertext output to detect leaks, and calculating the variance as the ratio between the group mean variance and the total leak variance.The results show that NICV is related to two standard metrics, Pearson correlation coefficient and signal-to-noise ratio (SNR).NICV can be used to theoretically calculate the minimum number of tracks required for an attack to be realized.The theoretical basis of NICV is given, and some practical applications are performed on real cryptosystems.
The above common leakage analysis methods have differences and advantages and disadvantages among them, but they all basically utilize all linear leaks associated with variables to perform the analysis, and all of them are inhibited to some extent if the resistance strategy eliminates all linear leaks associated with X.We used a method based on the fitting of a Gaussian mixture distribution using the GMM-EM algorithm, using the distribution of the theoretical Hamming weight as the initial centroid of the fit, in order to achieve a fast and accurate fit.

Leak Analysis Solution For RSM
For existing first-order SCA attacks, Nassar et al. proposed an AES-RSM (rotating S-box) encryption scheme in 2012, which has almost the same performance and complexity as unprotected AES, and also has strong resistance to first-order DPA and CPA as well as hardware protection-based defense strategies.

AES-RSM encryption scheme
In this section, the AES-RSM masking strategy is briefly introduced theoretically, and its core lies in the construction of add-masked S-boxes, which are preconfigured and calculated as follows.
The S-box as a new construction should satisfy S'j (x') That is, Sin is masked with addition and the value Sout after the output S-box is protected by mj+1 (mod 16).
The specific encryption process is shown in the following figure.

First-order leakage of RSM
Based on the Hamming weight model, the RSM plus mask approach appears to have no first-order leakage, but according to our experiments, it actually has leakage under other models, which is the focus of this paper, based on the difference in energy consumption distribution.
Although the mean variance of a deterministic X with all masked heterogeneous post-band masked values Y is constant, so that the commonly used leakage analysis methods cannot build an effective leakage model, the distribution of Y values after different plaintexts are masked is different, and it is still possible to distinguish different Xs by the difference of

Experimental data
The experimental data are divided into real data and simulated energy traces, where the real data are used to test the effectiveness of the method.The simulated energy traces, on the other hand, have different signal-to-noise ratios and can be used to observe the tolerance of the method to noise.

Real Data
We measured the running process of AES128-RSM encryption algorithm executed on a smart card.The experimental equipment includes an oscilloscope, a smart encryption card, a card reader, and a computer.The computer is responsible for sending commands to the card reader to control the encryption algorithm running on the encryption card, and sending plaintext messages to the smart card via USB.At the same time, the oscilloscope is triggered to perform the energy consumption curve acquisition.The oscilloscope sends the collected energy consumption profiles to the computer, and then performs some pre-processing, resulting in the final data set.The raw energy traces include 100,000 sample points with a total of 40,000 curves.

Simulation of energy traces
The intermediate values to be leaked are calculated according to the encryption algorithm, and the energy consumption is summed by bit conversion bit.Finally, the leaked diverse instincts are superimposed on a base energy consumption curve afterwards.If the signal-to-noise ratio needs to be reduced, white noise with different variance values is added.40,000 simulated energy traces each with signal-to-noise ratios of 0.9 and 0.7 were generated, of which 30,000 were used for training and 10,000 for testing.

Experimental indicators
In this paper, the custom leakage analysis method GDD (Group Distribution Difference) is used as the experimental index.The frequencies of each curve at different voltages with the same intermediate values are counted and normalized, the frequency distribution p is used instead of the probability distribution, the mean value of the frequency distribution for all intermediate values is used as the average distribution q, and the KL distance of pq is found as the size of leakage.According to the above figure 6,7, we can see that the leakage analysis using GDD can find the leakage points of the energy consumption curve.However, when the intensity of the noise increases, the value of the leakage indicator will become smaller.When the noise reaches a certain level, the leakage indicator may become invalid.

Experiments on real energy traces
Leakage analysis of real energy traces using GDD results in the following figure.

Figure 8. GDD's analysis of real trace
As can be seen in Figure 8 above, there are significant spikes at certain locations in the curve, which means that there may be significant leakage at these locations.By using the leakage analysis metrics, we can use MLP to train on these points where there is significant leakage, so that we can quickly and efficiently complete the training of the model and implement the attack.The specific effect of the attack is as follows.From the figure 9, we can see that the noise still has some influence on the final attack effect, the larger the noise, the worse the attack effect, but still can train an effective model.The attack effect on real data is better, and only 4 energy traces are needed to achieve the attack.

Conclusion
In this study, we explored the use of energy consumption frequency distribution to implement a method for leakage analysis in side channel attacks and verified the effectiveness of the method through experiments.By using GDD, we were able to identify the location of the leaks implemented by the cryptographic algorithm, and then attack the algorithm and complete the training of the model.In addition, we also explored the effect of noise on the leakage analysis results, and although noise has some influence on the results, it still has good results in real data.The contribution of this study is to provide an effective method to identify and solve the leakage problem of cryptographic algorithm implementations in side channel attacks, which is important to protect the security of cryptographic algorithms.In the future, we can further explore how to combine GDD with other methods to further improve the accuracy and practicality of the attack algorithm.We believe that the results of this study will have a positive impact on the development of the information security field.

Figure 1 .
Figure 1.Encryption Flow distribution through the joint attack of multiple energy traces.

Figure 3 .
Figure 3. Gaussian mixture distribution The numerical calculation methods of leakage metrics include methods that directly use energy consumption frequencies in groupings instead of probability distributions, methods based on kernel density estimation, and methods based on fitting Gaussian mixture distributions.

Figure 9 .
Figure 9.The effect of the attack after the leakage of different data with GDD