Named Entity Recognition of Electronic Medical Records Based on Multi-Feature Fusion
DOI:
https://doi.org/10.54097/fcis.v3i3.7983Keywords:
Chinese electronic medical records, Named entity recognition, Bert-Bi-LSTM-CRF model, Deep learningAbstract
Named entity recognition (NER) is a very basic task in natural language processing (NLP). The paper studies the problem of named entity recognition in Chinese electronic medical records, and proposes a method based on the Bert-Bi-LSTM-CRF model. In addition, the model incorporates the functionality of radical components and dictionaries to improve the recognition accuracy. The complexity of Chinese medical record entities, the ambiguity in language expression, and the lack of adequate labeled data make traditional rule-based or machine learning methods less effective. To address this problem, we adopt the Bert-Bi-LSTM-CRF model, which effectively captures contextual information and semantic relationships to improve entity recognition accuracy. Furthermore, to further enhance the model's performance, we introduce the functionality of radical components and dictionaries. Radical components are an important component of Chinese characters and can be used to assist in identifying entities and improve the model's generalization ability. We also utilize medical dictionaries to assist in entity recognition. These dictionaries contain rich medical terms and vocabulary, which can effectively help the model identify entities. The proposed method is evaluated on the public dataset CCKS2019, and the experimental results demonstrate that it outperforms traditional methods, achieving an F1 score improvement of nearly 7 percentage points, and achieves good experimental results.
Downloads
References
Chinchor N.MUC-6 Named Entity Task Definition (Version 2.1) [C]. Proceedings of the 6th Conference on Message Understanding, Columbia, Maryland, 1995:142-194.
Xiang Xiaowen, Shi Xiaodong, Zeng Hualin A Chinese named entity recognition system that combines statistics and rules [J]. Computer Applications, 2005, 25 (10): 3.
Zhang Chuanyan, Hong Xiaoguang, Peng Chaohui, et al Mesh entity activity extraction based on support vector machine and extended conditional random field [J]. Journal of Software Science, 2012, 23 (10): 16.
Huang Z, Wei X, Kai Y. A bidirectional LSTM-CRF model for sequence labeling [J]. Computer Science, 2015.
Strubelle E, Verga P, Belanger D, etc. Fast and accurate entity recognition using iterative expansion convolution [J]. two thousand and seventeen.
Yan H, Deng B, Li X, et al. TENER: Adaptive transformer encoder for named entity recognition [J]. two thousand and nineteen.
Yang J, Teng Z, Zhang M, et al. Combining discrete and neural features for sequence labeling [J]. two thousand and seventeen.
Wu S, Song X, Feng Z. MECT: A Cross Transform Based on Multivariate Data Embedding for Chinese Named Entity Recognition [J]. two thousand and twenty-one.
Liu Z, Zhu C, Zhao T. Chinese Named Entity Recognition Based on Sequence Marking Method: Based on Characters or Words? [J] . Springer Press, 2010.
Yang X, Mao K. Using comprehensive knowledge to learn multi prototype word embedding from single prototype word embedding [J]. Expert Systems and Applications, 2016, 56 (September): 291-299.
Zhang Y, Yang J. Net enrollment rate in China using Lattice LSTM [J]. 2018. Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks for Sequence Modeling [J]//arXiv Preprint, 2014: arXiv: 1412.3555.
Li Jie, Meng Ke. MFE-NER: Multi feature fusion embedding for Chinese named entity recognition [J]. two thousand and twenty-one.
Devlin J, Chang M W, Lee K, et al. BERT: Pre training of deep bidirectional converters for language comprehension [J]. two thousand and eighteen.
Yu Tongrui, Jin Ran, Han Xiaozhen, Li Jiahui, Yu Ting. Review of research on natural language processing pre training models [J]. Computer Engineering and Applications, 2020,56 (23): 12-22.
Hochreiter S, Schmidhuber J, Long term and short-term memory [J]. Neurocomputing, 1997, 9 (8): 1735-1780.
Wu Zongyou, Bai Kunlong, Yang Linrui, et al A review of research on text mining in electronic medical records [J]. Computer Research and Development, 2021, 58 (3): 15.
Zhao R, Wang D, Yan R, et al. Machine Health Monitoring Based on Local Feature Gated Recursive Unit Networks [J]. IEEE Industrial Electronic Trading, 2018.
Wu S, Song X, Feng Z. MECT: A Cross Transform Based on Multivariate Data Embedding for Chinese Named Entity Recognition [J]. two thousand and twenty-one.
Xu C, Wang F, Han J, et al. Using Multiple Embedding Technology for Chinese Named Entity Recognition: 10.1145/33577384.3358117 [P]. 2019.
Zhang Yunqiu, Wang Yang, Li Bocheng. Chinese electronic medical record named entity recognition based on Roberta WWM dynamic fusion model [J]. Data Analysis and Knowledge Discovery, 222,6 (Z1): 242-250.
Zhang Y, Yang J. Net enrollment rate in China using Lattice LSTM [J]. two thousand and eighteen.
Peng M, Ma R, Zhang Q, et al. Simplify the use of vocabulary in Chinese NER [J]. two thousand and nineteen.