Study on the Correlation between Drivers' Anger Language and Driving Anger

: In order to study the correlation between drivers' language and anger emotion, a questionnaire survey was conducted to collect drivers' language habits and select the anger keywords that drivers use more frequently. Then, the anger keywords were scored quantitatively from three aspects of anger degree, occurrence frequency and impact on driving safety, and the critical ratio, correlation, reliability and validity of the questionnaire were tested. The results show that the weight of "death" and "stupid" is higher than 0.1, and the weight of "God" and "drive" is less than 0.1, and the weight of "drive" is lower than 0.1. It can be seen that different anger languages have different degrees of representation effect on drivers' anger. This method can be used as one of the bases for identifying drivers' anger, which provides a theoretical basis for multi-modal drivers' anger recognition, and has certain practical significance for drivers' driving safety.


Introduction
In recent years, the process of urbanization has been accelerating, the traffic pressure continues to increase, and the level of traffic environment and traffic safety is not optimistic. In the road traffic system, people are the main factor leading to traffic accidents. Affected by the driving environment, drivers will lead to angry driving behaviors, including aggressive driving behaviors such as random lane change, forced overtaking, and running red lights. These behaviors are often accompanied by the use of rude gestures, verbal insults, threats, rough behavior, and even malicious damage to others' property, which will not only affect normal traffic driving, but also seriously affect social security. Because the angry behavior of drivers is often accompanied by the output of angry language, this study aims to analyze the correlation between driver's language and anger during driving, which is of great significance to automatically identify drivers' anger through computers and ensure traffic safety.
At present, in the research on driver's angry language and anger at home and abroad, comprehensive recognition research focuses on speech features rather than semantic features.
In terms of voice features, Li Shingling et al. analyzed the spectral characteristics of the driver's voice, built the recognition model by optimized PNN (Probabilistic Neural Networks) based on Firefly Algorithm (Firefly Algorithm, FA), which has certain advantages over traditional methods. Wang wenjing used the mel frequency cepstrum coefficient (MFCC) as features to realize the road rage diagnosis system based on speech; and in noisy environment, the accuracy of anger recognition rates is 96.27% and 97.87% obtained on the RAVDESS and CASIA data sets respectively Zhang huiyun's research summarized emotional speech recognition from the aspects of speech emotion corpus, emotion feature extraction and application of speech emotion recognition, listed common emotion classification methods Nguye et al. used cascading 3-dimensional neural networks to extract the voice and face behavior features, and mixed the modes; they used the DBN algorithm to achieve feature level fusion, and the recognition accuracy reached 90.85%. Zhang et al. used DBN fusion network to achieve model level fusion of voice information and face information, as a classification algorithm, support vector machine has a recognition accuracy of 85.97%.
In terms of language characteristics, Gao chengji proposed a language emotion recognition method, which describes the English language sentiment using the two-dimensional space model of potency-activity, and uses the staged approach based on Gaussian mixture model and support vector machine to measure the emotional valence and emotion Activation for classification and identification, with the highest accuracy of 96.9%.Chen xin proposed a metaphorical emotion identification method by compositing emotional scenes (MET-ES), this method constructs emotional represent at-ions of sentences by utilizing the emotional scenes of words, and depicts the emotional reasons by fusing emotional representations of word pairs, which are conducive to the model in order to correctly identify the emotions of sentences. Hu Qianjing studied the effect of emotion on driving behavior, established specific driving indicators significantly related to emotion from the physiological and behavioral levels, and finally proposed feasible strategies for reasonable compensation. Tim et al. conducted feature modeling in terms of language and acoustics, acoustically, statistical data are generated according to acoustic audio descriptors; linguistically, word and phrase models based on probability and entropy are used to analyze their applications in anger classification. Gaonka et al. designed tag semantics to better encode semantic information related to emotion. In order to solve the multi label problem of text emotion, Li et al. established a factor graph based on emotion labels and upper and lower instance dependencies, and designed a factor graph reasoning model. In order to solve the problem of insufficient annotation data in emotion recognition tasks and improve the results of resource poor Chinese emotion recognition, Zhang et al. used resource rich English corpora to construct English text emotion recognition tasks to help Chinese text emotion recognition.
To sum up, anger language research focuses on speech recognition, using typical speech feature parameters such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Predictive Cepstral Coefficients (LPCC) to model and complete anger emotion recognition; model performance is improved on the basis of face recognition, to prove the correlation between voice and emotion recognition. As for language research, domestic research is not extensive, and most of them prefer theoretical analysis of language texts. In order to further analyze the correlation between drivers' angry language and anger, the questionnaire survey method is used to select the typical anger language in drivers' angry driving, establish the rating system of anger words, explore the correlation between language words and anger, and provide theoretical support for the recognition of multimodal drivers' anger.

Questionnaire investigation
The text designed a questionnaire for two purposes. First, it is necessary to understand the driver's language habits and establish an angry language database so that angry language can be selected and graded through the result statistics; second, it is necessary to investigate and count which driving behaviors are more likely to irritate drivers during driving, so as to more effectively induce drivers to generate anger during the simulation driving experiment. Therefore, the questionnaire includes anger language collection questionnaire and anger language rating questionnaire.

Angry language collection questionnaire
In order to establish a anger language database that can accommodate a certain sample size, this paper designs a subjective driver anger language collection table, which includes the basic information of the driver, such as driver's gender, age, driving age, whether the driver can exp-ress anger in words during driving, and what angry language he can say.
The questionnaire is distributed by electronic questionnaire using multi-stage stratified overall sampling method to ensure the reliability of the data. Table 1 shows the statistical results of the basic information of drivers. It can be seen that the proportion of men and women in the sample is balanced, and the age group is mainly young, which is in line with the current trend of young drivers. The population with different driving ages is relatively balanced, reflecting the scientific and representative nature of this survey. A total of 743 angry words were collected in the questionnaire, through word frequency analysis of the text, 420 angry words were obtained, such as "angry death" and "sick". Among them, 10 words appeared more than 20 times, and the rest appeared less than 8 times. There is a large gap in the data frequency, so the top 10 words of the frequency are selected as angry language keywords. Since the frequency of anger language keywords used among drivers can reflect the language habit characteristics of drivers when they are angry to some extent, the higher the frequency of use, the better the driver's anger can be represented. Therefore, this paper continues to use these 10 keywords for the rating of anger language. The complete spelling and frequency of the main keywords are shown in Fig. 1.

Angry language rating questionnaire
This survey imitates the Likert 5 scoring method so that it can quantitatively evaluate keywords Evaluate the 10 anger keywords from three aspects: the degree of anger("extremely light", "slightly light", "average", "slightly heavy", "extremely heavy"), the frequency of occurrence("never", "occasionally", "generally", "often", "always"), and the impact on driving safety("extremely light", "slightly light", "average", "slightly heavy", "extremely heavy"), and assign the options one to five points in order to quantify the evaluation. A total of 642 questionnaires were distributed in the second stage, with an effective questionnaire rate of 96.1%. Table 2 shows some examples of questionnaires. Table 2. Some examples of questionnaires 1. If you say "death" in anger during driving, your anger level is A. extremely light (Score 1 point) B. slightly light (Score 2 point) C. average (Score 3 point) D. slightly heavy (Score 4 point) E. extremely heavy (Score 5 point) 2. During driving, you think the frequency of angry words "death" is A. never (Score 1 point) B. occasionally (Score 2 point) C. generally (Score 3 point) D. often (Score 4 point) E. always (Score 5 point) 3. In the process of driving, do you think the degree of influence of saying "death" in anger on driving safety is A. extremely light (Score 1 point) B. slightly light (Score 2 point) C. average (Score 3 point) D. lightly heavy (Score 4 point) E. extremely heavy (Score 5 point)

Critical ratio analysis
The total scores of the survey scale are arranged in descending order, and the first 30% is selected as the high group, and the last 30% is selected as the low group. The difference of each item between the high and low scores of the scale is tested by the method of independent sample t-test to obtain the critical ratio (CR) of the scale. The calculation formula is as follows = In formula (1), c is the factor load of each item, is the residual variance of each item. The results showed that the difference between the two groups was significant, the CR value was less than 0.04, and the item discrimination ability was good. The questionnaire items were classified into anger degree group, frequency group and influence degree group according to the content of the questions, and the correlation between the three category item groups and between each small item question and category group was explored. The correlation coefficient between each item and category item group of the survey scale is shown in Table 3. It can be seen from the table that the correlation coefficient between the items of the scale is relatively significant, especially between the frequency group and the impact degree group, which indicates that the internal consistency between the items is good, and can provide satisfaction and reliability levels with good quality.

Reliability analysis
The text adopts Cronback's α Coefficient to measure the reliability to analyze the reliability and effectiveness of the questionnaire the formula is [18] (2) In formula (2), indicates the number of items in the question, 2 is the variance within the question of the score of question , 2 is the variance of the total score of all questions. SPSS21.0 software was used to test the reliability of anger language rating questionnaire. There were 30 questions in the questionnaire. After excluding invalid questionnaires, there were 617 rows of valid data. Overall Cronback's α the coefficient is 0.732, and the reliability of the two retests is 0.722 and 0.707 respectively. It can be considered that the scale is highly reliable, Cronback's of other dimensions α the coefficient and retest reliability are above 0.70, which indicates that the reliability of the scale is good. The inspection results are shown in Table 4.

Validity analysis
In this paper, we need to analyze the validity of the survey scale through structural validity analysis. The results showed that Bartlett's spherical test 2 value was 5721.099 (P=0.000<0.01), indicating that it was suitable for factor analysis; KMO value is 0.799, indicating that factor analysis is effective. The mode matrix, eigenvalue and variance interpretation rotated by the maximum variance method are extracted by principal component analysis, and three common factors with eigenvalue greater than 1 are selected, as shown in the simplified gravel diagram in Fig.2. The abscissa is 30 items, and the ordinate is its corresponding eigenvalue. It can be seen from the figure that the eigenvalue of three factors is significantly higher than 1, and the cumulative variance contribution rate of three common factors is 70.85%. According to the contents of items included in each factor, three factors are determined as anger degree, frequency of occurrence and impact on driving safety. The factor loads of each project are shown in Table 5.

Fuzzy comprehensive evaluation
Since the judgment of anger is subjective, fuzzy and difficult to quantify, the fuzzy comprehensive evaluation method is selected for analysis, it is characterized by clear and systematic results. Qualitative evaluation can be converted into quantitative evaluation according to the membership theory of fuzzy mathematics, that is, fuzzy mathematics can be used to make an overall evaluation of things or objects restricted by multiple factors. The research ideas of fuzzy comprehensive evaluation are as follows: (1) Determine the indicator set of the evaluation object, namely the factor set. Suppose there are n evaluation indicators, which can be expressed as: U={u1,u2,...,un} In formula (3), ui(i=1,2,…n) is the evaluation index.
(2) Determine the evaluation result set of the evaluation object, that is, the evaluation set. Suppose there are m evaluation grades, which can be expressed as: V={v1,v2,...,vm} (4) (3) Determine the weight set of evaluation indicators, that is, the weight set. Suppose there are n evaluation indicators, which can be expressed as: A= {a1, a2..., an}, ai>0, ∑ = 1 =1 (5) (4) Determine the membership function and calculate its membership degree, and form the factor evaluation matrix R, that is to say, suppose the evaluation of the ith index ui: Ri={ri1,ri2,...,rin} In formula (6), Ri(i=1, 2,…n)is a single factor fuzzy subset, In formula (7), R is the evaluation matrix on V. (5) multiplies the factor evaluation matrix by the weight set to obtain the comprehensive evaluation model: B=A∘R= (b1, b2, , bn) (8)

Entropy weight method
The entropy weight method is often used to determine the weight set if there is no authoritative data standard, it uses information entropy to calculate the entropy weight of each index according to the variation degree of each index, and gets more objective index weight after correction. The research idea of entropy weight method is as follows: (1) Determine indicator data matrix: If different indicator types are different, indicator forward transformation is required, that is, all indicators are unified into smaller and better indicators or larger and better indicators. Assuming that there are m projects to be evaluated, a total of n evaluation indicators are set, which can form an indicator data matrix: (2) Data standardization In order to unify the dimensions of each evaluation index, the data needs to be standardized, in the standardization method, the maximum minimum standardization method is commonly used, and its calculation formula is: (3) Calculating information entropy First, calculate the probability matrix P, that is, the proportion of the ith sample under index j, and the calculation formula is: Then calculate the information entropy corresponding to each index, and normalize it to obtain its respective entropy weight, the calculation formula is: The formula (12) represents the size of information entropy, the greater the information entropy, the smaller the corresponding amount of information; in Formula 13, is the value of information utility; in Formula 14, the entropy weight of each index is the result of normalization of information utility value.
(4) Empowerment To give weight to the data, calculate the final score, the calculation formula is:

Result analysis
In order to conduct quantitative rating of anger for 10 anger keywords in the questionnaire, the results were analyzed according to the steps of fuzzy comprehensive evaluation.
The evaluation indicators of the object include the degree of anger of the key words, the frequency of occurrence and the degree of impact on driving safety, so the factor set is taken as follows: U= {the degree of anger, the frequency of occurrence, the degree of impact on driving safety} The evaluation result is 10 angry keywords, so the evaluation set is taken: V={dead, stupid, Dammit, fuck, driving, taking, cor, sick, annoying, waste } Using entropy weight method to determine the weight of factor set, the value of m is 10, the value of n is 3, according to the entropy value calculation formula (14)  According to the model results, the weight distribution of 10 keywords is shown in Figure 3, it can be seen intuitively in the figure that the weight of "dead", "stupid", "fuck", "sick" and "waste" is higher than 0.1, so these five keywords are listed as Level I angry words; the weight of "Dammit", "driving", "surrender", "cor" and "annoying" is less than 0.1, which is listed as Level II angry words. In the following research, anger language can be used as a feature vector for angry driving recognition system according to the classification.

Conclusion
This paper collects anger keywords commonly used by drivers through questionnaire survey, quantifies and scores anger keywords from three aspects of anger degree, frequency and impact on driving safety, determines the weight of three indicators using entropy weight method, and empowers anger keywords through fuzzy comprehensive evaluation. The conclusions are as follows: (1) The anger language rating questionnaire designed from three aspects of anger degree, frequency and impact on driving safety has high reliability and validity, which can be used as a reference for anger language collection.
(2) Five first level anger words, such as "death" and "stupid", and five second level anger words, such as "Dammit" and "driving", were selected. It can be seen that different angry words and sentences have different degrees of representation effects on drivers' anger. This method can be used as one of the bases for the computer to automatically identify drivers' anger.