Research on Network Customer Classification Based on BP Neural Network Algorithm

: Correctly and effectively customer classification according to their characteristics and behaviors will be the most important resource for electronic marketing and online trading of network enterprises. A new customer classification model for online trading customer classification is presented based on analyzing customer characteristics and behaviors .First the paper designs 21 customer classification indicators based on consumer characteristics and behaviors analysis, including customer characteristics type variables and customer behaviors type variables; Second, Aiming at the shortages of the existing BP neural network algorithm of data-mining for customer classification, immune genetic algorithm is used to correct BP neural network to speed up the convergence of the model. Finally the experimental results verify that the new algorithm can improve effectiveness and validity of customer classification when used for classifying network trading customers practically.


Introduction
Customer relations management is one of the core problems of modern enterprises, whose customer oriented thought requires CRM system to be able to effectively obtain various kinds of information of customers, identify all the relations between the customers and enterprises and understand the transaction relation between customers and enterprises; meanwhile, deeply analyze customers' consuming behavior, find customers' consumption characteristics, providing personalized service for customers, supporting the decisions of enterprises. The three basic problems CRM needs to solve are how to get customers, how to keep customers and how to maximize customer value, among which maximizing customer value is the ultimate purpose, getting customers and keeping customers are both the means for realizing the purpose. The core of analyzing the three problems CRM needs to solve is to classify customers. "Getting Customers" and "Retaining Customers" need to ascertain which customers are attainable, which customers need to be kept, which customers are kept for a long term and which customers are kept for a short term, therefore, customer classification is needed. It is the same case with "Maximizing Customer Value". Due to different values of different customers, "Maximum Customer Value" of different customers should be distinguished. Thus, the core problem of enterprises to correctly implement CRM is to adopt effective method to reasonably classify customers, find customer value, focus on high-value customers with enterprises' limited resources, provide better service for them, keep "High-value" customers for loss prevention; also, establish corresponding customer service system through classification, carry out differential customer service management. Hence, customer classification is becoming a more and more popular research hotspot, also a research difficulty, becoming one of the urgent problems of CRM [1].

Selection of Customer Classification Indicators
The selection of reasonable classification variables is the basis of correct and effective customer classification, namely establishing scientific and reasonable classification indicators system. In view of the nature of trading and own characteristics of online trading, this Paper adopts customer characteristics type variable and customer behaviors type variable in the specific selection of customer classification variables [2].
(1) Selection of Customer Characteristics Type Variable Customer characteristics type variable is mainly used for getting the information of customers' basic attributes. Such variable indicators as geographical position, age, sex, income of individual customer play a key role in determining the members of some market segment. This kind of variables mainly comes from customers' registration information and customers' basic information collected from the management system of banks, the contents of which mostly indicate the static data of customers' basic attributes, the advantage of which is that most of the contents of variables are easy to collect. But some of the basic customer-described contents of variables are lack of differences at times.
Based on analyzing and summarizing existing literatures, the customer characteristics type variables designed in this Paper include: Customer No., Post Code, Date of Birth, Sex, Educational Background, Occupation, Monthly Income, Time of First Website Browsing, and Marital Status.
(2) Selection of Customer Behaviors Type Variables Customer behaviors type variables mainly indicate a series of variable indicators related to customer transacting behavior and relation with banks, which are used to define the orientation which enterprises should strive for in some market segment, and are the key factors for ascertaining target market. Customer behaviors type variables include the records of customers buying services or products, records of customer service or production consumption, contact records between customers and enterprises, as well as customers' consuming behaviors, preferences, life style, and other relevant information.
Based on analyzing and summarizing existing literatures, the customer behaviors type variables designed in this paper include Monthly

Simultaneous Analysis and Design
De Castro indicated that there were similarities among the quality of weight value initialization of back-propagation neutral network and the relationship of network output and the quality of antibody instruction system initialization in the immune system and the quality of immune response. A simultaneous analysis and design---SAND algorithm was advanced to solve the problem regarding the weight value initialization in the back-propagation network [7,8]. In SAND algorithm, each antibody corresponds to a weight value vector of neuron given in one of several layers of neural networks, the length is l , and the affinity ) , ( SAND algorithm aims to reduce the similarities between the antibodies and produce the antibody repertoire to cover the entire form space with the best, so energy function is maximized. The energy function is shown in Formula 3.
In the method of Eculidean form space, the energy function is not percentage. With a view to the diversity of the vector, SAND algorithm has to define the stop condition. Given vector

BP Neural Network Design Based on Immune Genetic Algorithm
According to the actual application, providing that both the input and output number of node and the input and output values in BPNN have been confirmed, activation function adopts S type function. The following steps show BP neural network design based on immune genetic algorithm.
(1) Every layer of BPNN carries on the weight value initialization separately by SAND algorithm.
(2) Antibody code. The initial weight value derived by SAND algorithm constructs the structures of BPNN. Each antibody corresponds to a structure of BP neural network. The number of hidden node and network weight value carry on the mixture of real code. Each antibody serials are shown in Fig.3.

N number of hidden node
Weight value corresponding to the first hidden node Weight value corresponding to the second hidden node … Weight value corresponding to the N hidden node (4) Genetic operation. The model here adopts the Gaussian compiling method to go on the genetic operation so as that each antibody decoding is the corresponding network structure and change the network weight value as shown in shows that the mean value is zero and squared error is normal distribution random variable of l , and ) 1 , 1 (   is the individual variation rate. It is seen in Formula 8 that the variation degree varies inversely as the fitness, i.e. the lower the fitness is (the less the fitness value of objective function is), the higher the individual variation rate is, or vice versa. After the variation, all the hidden node and weight value components constitute a new antibody again. It is seen in Formula 10 that while the antibody density is high, the probability of selecting the antibody with high fitness is low, and conversely high. Therefore, excellent individual is not only retained, but the selection of similar antibodies is reduced, and the individual diversity is guaranteed.

Object of Experimental Verification
The instance data of the experiment conduct empirical research on the customer data of the B2C transaction of certain enterprise website of the recent three years (totaling data of 41351 customers, 21 attributes in the data table are listed in the third part of the paper including customer characteristics type variables and customer behaviors type variables), making statistics on attribute values like annual transaction frequency, total amount, product cost, etc. of certain customer according to customer transaction records in information base, forming an information table (among which the decision attribute set D is null) [4].

Process of Experimental Verification
The process of the experimental verification can be listed as follows [5].
First, what is to be processed during the classification is the numeric data, so the numeric coding on character data should be conducted first; Second, if the value number of certain attribute is equal to sample number, it means that it has little effect on classification, hence, remove such attribute first. Three attributes as Customer No., Post Code and Date of Birth are removed in this case.
Third, establish training sample set according to domain (prior) knowledge. Times of purchasing and total amount of purchasing of each customer are two major factors of customer classification (this is the prior knowledge of domain), so select 400 pieces of typical data among all the customers to form training sample set. And divide them into five types as Gold Customers, Silver Customers, Copper Customers, General Customers and Negligible Customers according to ABC management theory.
Fourth, use the customer classification algorithm abovementioned, and the customer classification results can be expressed in Table 1. In the specific algorithm realization, this Paper simultaneously realizes ordinary K_means algorithm and customer classification algorithm based on BP neural network. The performance comparison of these three algorithms can be expressed in Table 2. We can see from Table 1 that in the autonomous learning of algorithm of this Paper, such five factors as the educational background, income, occupation, times of purchasing, and total amount of purchasing of customers have a relatively great influence on customer classification. Through the classification result in Table 1, it can be seen that Gold Customers take up 6.89% of the total number of customers, while the profit takes up 52.1% of the total profit. These customers play a significant role in the existence and development of enterprises. However, the negligible customers account for 17.7%, who not only do not bring profit to enterprises, but also make enterprise lose money. These customers should be either further cultivated or eliminated according to the actual situation. We can see from Table 2 that the cluster accuracy rate of algorithm in this paper is the highest, reaching 99.7 %, obviously higher than ordinary K-means algorithm and BP Neural Network algorithm; the square errors and E values on customer classification of three algorithms are 104.33, 159.81 and 119.96 respectively. The smaller the E value is, the smaller the possibility of wrong classification is. Thus it can be seen that the square error and E value of the algorithm in this paper during the classification are far more less than ordinary K-means algorithm [4] and BP Neural Network algorithm [6]. Therefore, it shows that the improvement on K-means clustering algorithm in this paper turns out to be a success, with reasonable classification results.

Conclusion
Customer relations management of online trading is still developing. But to correctly and effectively classify online trading customers is the critical issue for reforming network marketing mode, improving customer management and service level and enhancing competitiveness of network enterprise. On account of the shortcomings of the typical K_means clustering algorithm in data mining, this Paper puts forward several improvement measures, and applies them into the classification of online trading customers. Simulation results indicate that the improved online trading customer classification has higher accuracy rate on customer classification and more reasonable classification results.