Discovery of Weibo New Words Based on Rules and SVM
DOI:
https://doi.org/10.54097/cpl.v11i2.12815Keywords:
Weibo data; Discovering new words; Constraints; SVM.Abstract
This article combines the proposed word features with SVM for Weibo new word recognition and extraction. Firstly, by modifying the segmentation dictionary to simulate Weibo new words, the training and testing corpus are segmented using the segmentation dictionary, and various proposed word features are counted. Then, the positive and negative samples extracted from the training corpus are vectorized using word features, and different kernel functions are selected to obtain Weibo new word classification support vectors through SVM training. By adding relaxation variables to improve the accuracy of classification, the Weibo new word classification support vector obtained from the training corpus and the Weibo new word candidate vector obtained from the test corpus are combined for SVM testing to obtain the calculated value of each candidate Weibo new word. The final Weibo new word recognition result is obtained by comparing the calculated value and threshold. Experiments have shown that the combination of word features and SVM can be used for the recognition and extraction of Weibo new words, and relatively good results have been achieved. This method can be extended to the application field of Weibo new word recognition.
References
Han Xiulong. Research on Weibo New Word Discovery Based on SVM and Feature Correlation [J], Computer Knowledge and Technology, 2018,14,66-69.
Fu Lina, Xiao He, Ji Donghong. New Emotional Word Recognition Based on OC-SVM [J], Computer Application Research, 2015,71946-1048.
Feng Yong, Li Hua. Based on Adaptive Chinese word segmentation and approximation of SVM text classification algorithm [J], computer science, volume thirty-seventh, 2010, first, 251-254, 293.
Qian Qiuyin, Zhang Zhenglan. A method based on multiple SVM classification method of relevance feedback image retrieval [J], computer technology and development, 2009, volume nineteenth, issue eighth, 66-69.
Huang Xiuli, Wang Yu.SVM in unbalanced data set [J], computer technology and development, 2009, volume nineteenth, issue sixth, 190-193.
Li Chengcheng,Xu Yuanfang, Based on support vector and word features new word discovery research, proceedings of 2012 IEEE International Conference on Computer Science and Automation Engineering ,2012,166-168.
Jian-Yun Nie, Unknown Word Detection and Segmentation of Chinese using Statistical and heuristic Knowledge. Communications of COLIPS,2008,5(I&2),47-57.
