News Short Text Classification Based on Bert Model and Fusion Model

Authors

  • Hongyang Cui
  • Chentao Wang
  • Yibo Yu

DOI:

https://doi.org/10.54097/hset.v34i.5482

Keywords:

Text Classification, Bert, Fusion model

Abstract

Text classification task is one of the most fundamental tasks in NLP, and the classification of short news text could be the basis for many other tasks. In this paper, we applied a fusion model combining Bert and TextRNN with some modified details to expect higher accuracy of text classification. We used the THUCNews as dataset which consists of two columns one for news text and the other for numbers. The original dataset was seperated into three parts: training set, validation set and test set. Besides, we used BERT model which contains two pre-training tasks and TextRNN model which refers to the use of RNN to solve text classification problems. We trained these two models in parallel, and then the optimal Bert and TextRNN models obtained through training and parameter tuning are added with a fully-connected layer to receive the final results by weighting the efficiency of Bert and TextRNN. The fusion model solves the problem of over-fitting and under-fitting of a single model, and helps to obtain a model with better generalization performance. The experimental results show the sharp change in loss and accuracy as well as the final accuracy of the BERT model. The precision, recall-rate and F1-score are also evaluated in this paper. The accuracy of fusion model of BERT and TextRNN is much better than single Bert model and has a gap to 1.76%.

Downloads

Download data is not yet available.

References

Xu Baoxin, Huai Libo, Cui Rongyi. Naive Bayes algorithm application in the classification of news based on MapReduce [J]. Journal of Yanbian University (Natural Science Edition), 2017,43(01): 55-59.DOI:10.16379

Li Yue, Tang Kun. Policy text classification based on TextRNN [J]. Electronic Design Engineering, 2022,30(12): 43-47.DOI:10.14022

Duan Dandan, Tang Jiashan. Wen Yong, Yuan Kehai. Chinese Short Text Classification Algorithm Based on BERT Model [J]. Computer Engineering, 2022,30(12): 43-47.DOI:10.14022

Gong Weiyin, Wei Xuqin. News Text Classification Method Based on BiLSTM-RNN Model [J], Computer Knowledge and Technology, 2021,17(21): 105-107.DOI:10.14004

Yang Fei-hong, Wang Xu-wen, Li Jiao. BERT-TextRNN-based classification of short texts from clinical trials [J]. Chinese Journal of Medical Library and Information Science, 2021,30(01):54-59.

Natural Language Processing and Computational Social Science Lab. Thuctc [R], 2022 http://thuctc.thunlp.org/

Mohammed, Adam AQ, et al. Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition [J]. Journal of Ambient Intelligence and Humanized Computing (2022): 1-14.

Yu, Jun, et al. Multi-model Ensemble Learning Method for Human Expression Recognition [R]. arXiv preprint arXiv:2203.14466 (2022).

Khan, Aisha Urooj, et al. Mmft-bert: Multimodal fusion transformer with bert encodings for visual question answering [R]. arXiv preprint arXiv:2010.14095 (2020).

CSDN. Text-RNN [R]. 2020, https://blog.csdn.net/beilizhang/article/details/109005461

Downloads

Published

28-02-2023

How to Cite

Cui, H., Wang, C., & Yu, Y. (2023). News Short Text Classification Based on Bert Model and Fusion Model. Highlights in Science, Engineering and Technology, 34, 262-268. https://doi.org/10.54097/hset.v34i.5482