Research on Weibo New Word Recognition based on Weibo Data and Statistical Information
DOI:
https://doi.org/10.54097/fcis.v5i2.13147Keywords:
Weibo Data, Statistical Information, Mutual Information, Information EntropyAbstract
One of the key challenges in the field of Chinese information processing is the recognition of Weibo new words, which has a profound impact on machine translation and text classification. As Weibo has become the most used social platform for internet users, mining new vocabulary from Weibo data not only helps to deeply understand the data itself, but also provides personalized recommendation services for users. Although a large amount of research has focused on the recognition of Weibo new words, specialized research in this field is still scarce. In this article, we propose a Weibo new word recognition strategy that combines Weibo content features and statistical information. Firstly, extract repetitive vocabulary from Weibo topic names, and then use various methods such as absolute frequency, relative frequency, mutual information, and information entropy to filter for incorrect vocabulary. The experimental results show that by setting appropriate thresholds, incorrect vocabulary can be effectively filtered out, thereby improving recognition performance.
Downloads
References
Fu Lina, Xiao He, Ji Donghong. New Emotional Word Recognition Based on OC-SVM [J], Computer Application Research, 2015,71946-1048.
Han Xiulong. Research on Weibo New Word Discovery Based on SVM and Feature Correlation [J], Computer Knowledge and Technology, 2018,14,66-69.
Li Chengcheng, Xu Yuanfang, Based on support vector and word features new word discovery research, proceedings of 2012 IEEE International Conference on Computer Science and Automation Engineering ,2012,166-168.
Feng Yong, Li Hua. Based on Adaptive Chinese word segmentation and approximation of SVM text classification algorithm [J], computer science, volume thirty-seventh, 2010, first, 251-254, 293.
Qian Qiuyin, Zhang Zhenglan. A method based on multiple SVM classification method of relevance feedback image retrieval [J], computer technology and development, 2009, volume nineteenth, issue eighth, 66-69.
Su Ning. Based on word features and search engine for Chinese new word identification [J], Journal of Wuhan University, 2010, volume fifty-sixth, issue sixth, 704-710.
Huang Xiuli, Wang Yu.SVM in unbalanced data set [J], computer technology and development, 2009, volume nineteenth, issue sixth, 190-193.
Jian-Yun Nie, Unknown Word Detection and Segmentation of Chinese using Statistical and heuristic Knowledge. Communications of COLIPS,2008,5(I&2),47-57.


