A survey of research methods of automatic text summarization

Min Zhang; Cuiju Luan

doi:10.54097/63xnzx31

Authors

Min Zhang
Cuiju Luan

DOI:

https://doi.org/10.54097/63xnzx31

Keywords:

Automatic text summarization, extractive, abstractive, summarization approaches.

Abstract

This template explains and demonstrates how to prepare your camera-ready paper for Trans Tech Publications. Automatic text summarization is an information compression technique that uses a computer to convert text or text collections into short summaries. Recently, studies on automatically summarizing texts using different methods have developed rapidly. By combing the relevant documents at home and at abroad, various techniques and methods involved in the existing automatic text summary task, as well as the commonly used evaluation indicators, the advantages and disadvantages of the current automatic text summary task are summarized and the future research trends are discussed.

Downloads

Download data is not yet available.

References

LUHN H P. The automatic creation of literature abstract s[J]. IBM Journal of Research and Development, 1958, 2 (2): 159 - 165. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68 – 73.

BAXENDALE P B. Machine-made index for technical literature—an experiment [J]. IBM Journal of Research and Development, 1958, 2 (4): 354 - 361.

EDMUNDSON H P. New methods in automatic extracting [J]. Journal of the ACM, 1969, 16 (2): 264 - 285.

Ko Y, Seo J. An effective sentence-extraction technique using contextual information and statistical approaches for text summarization. Pattern Recognition Letters, 2008, 29 (9): 1366 - 1371.

Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 23: 126 - 144.

Shen D, Sun J-T, Li H, et al. Document Summarization Using Conditional Random Fields. [C]//IJCAI: Vol 7. 2007: 2862 – 2867.

Lafferty J, Mccallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[J]. proceedings of icml, 2002.

Abirami Rajasekaran, Dr. R. Varalakshmi, Review on automatic text summarization, International Journal of Engineering & Technology [J], 7 (2.33) (2018) 456 - 460.

TARDAN P, ERWIN A, ENG K I, et al. Automatic text summarization based on semantic analysis approach for documents in Indonesian language[C]//2013 International Conference on Information Technology and Electrical Engineering (ICITEE). 2013: 47 - 52.

JAGADEESH J, PINGALI P, VARMA V. Sentence extraction based single document summarization [R]. Work shop on Document Summarization,2005.

George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38 (11): 39 - 41, 1995.

Chen Y, Wang X, Guan Y. Automatic text summarization based on lexical chains [C]//Proceedings of the1st International Conference on Natural Computation. Springer, 2005: 947 - 951.

BARZILAY R, ELHADAD M. Using lexical chains for text summarization [J]. Advances in Automatic Text Summarization, 1999: 111 - 121.

JAIN A, GAUR A. Summarizing long historical documents using significance and utility calculation using WordNet [J]. Imperial Journal of Interdisciplinary Research, 2017, 3 (3).

Rada Mihalcea. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, page 20. Association for Computational Linguistics, 2004.

Erkan G, Radev D R. Lex PageRank: Prestige in multi-document text summarization [C]//Proc of the 2004Conf on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2004: 365 371.

Rahman M, Roy C K. STRICT: Information retrieval-based search term identification for concept location [C]. International Conference on Software Analysis, Evolution and Reengineering. IEEE, 2017: 79 - 90.

Huang Bo, Liu Chuancai. Chinese automatic text abstract based on weighted TextRank [J]. Application Research of Computers, 2020,37 (02): 407-410

Zheng H, Lapata M. Sentence Centrality Revisited for Unsupervised Summarization [J]. ar Xiv preprint ar Xiv: 1906. 03508, 2019.

Mehdad Y, Carenini G, Ng R T. Abstractive summarization of spoken and written conversations based on phrasal queries [C]//Proc of the 52nd Annual Meeting of the ACL. Stroudsburg: ACL, 2014: 1220 1230.

Gong Y, Liu X. Generic text summarization using relevance measure and latent semantic analysis [C]//Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001: 19 – 25.

Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993 1022.

Hofmann T. Probabilistic latent semantic indexing [J]. ACM SIGIR Forum, 2017, 51 (2): 211 218.

Teng Z, Liu Y, Ren F, et al. Single Document Summarization Based on Local Topic Identification and Word Frequency [J]. Seventh Mexican International Conference on Artificial Intelligence, 2008: 37 - 41.

Liu S X, Yang L P. Method and apparatus for improving the readability of an automatically machine-generated summary: U. Spatent 8, 650, 483. 2014 - 2 - 11.

Salton, G, Yu, et al. On the construction of effective vocabularies for information retrieval [J]. Acm Sigplan Notices, 1975.

Banerjee S, Mitra P, Sugiyama K. Multi-document abstractive summarization using ILP based multi-sentence compression [C]//Proc of Int Joint Conf on Artificial Intelligence. Menlo Park: AAAI, 2015: 1208 1214.

XU Xiaolong, YANG Chunchun, Multi-document summarization algorithm based on topic clustering.1673 - 5439 (2018) 05 - 0070 - 09.

Chen Chen, Zhang Lu, Wu Zhiang. Automa-tic summarization algorithm for word sentence collaborative sorting [J]. Journal of Jiangsu University (Natural Science Edition), 2016, 37 (4): 443 - 449.

Liu Fei, Flanigan J, Thomson S, et al. Toward abstractive summarization using semantic representations [C]//Proc of the 2015Conf of the NAACL. Stroudsburg: ACL, 2015: 1077 1086.

Wang Jicheng, Wu Gangshan, Zhou Yuanyuan, et al. A text structure guided automatic summarization method for Chinese Web documents [J]. Journal of Computer Research and Development, 2003, 3: 398 - 405.

J. MacQueen,Some methods for classification and analysis of multivariate observations, Berkeley Symposium on Mathematical Statistics and Probabil-ity1967.

M. Steinbach, Ge. Karypis, V. Kumara. A Comparison of Document Clustering Techniques. In KDD Workshop on Text Mining, 2000. (also see TR 00-034, University of Minnesota, MN).

T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning. Springer, 2001.

Shi Mengjie A Survey of Text Clustering Algorithms [J] Modern Computer, 2014 (2): 5.

Wang Sen, Liu Chen, Xing Shuaijie, Overview of K-mean-s Clustering Algorithm [J], Journal of East China Jiaotong University, 2022, 39 (05), 119 – 126.

Zhang J, Fung P. Speech summarization without lexical features for Mandarin broad-cast news[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. 2007: 213 – 216.

DENKOWSKI M, LAVIE A. Meteor universal: Language specific translation evaluation for any target language [C]// Proceedings of the 9th Workshop on Statistical Machine Translation. 2014: 376 - 380.

PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for Automatic Evaluation of Machine Translation [C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002: 311 - 318.

KUPIEC J, PEDERSEN J, CHEN F. A trainable document summarizer [C]//The 18th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1995: 68 - 73.

LIN C Y. Training a selection function for extraction [C]// The Eighth International Conference on Information and Knowledge Management. ACM, 1999: 55 - 62.

Chen Jianfei, Zhu Jun. Efficient learning algorithm of maximum entropy discriminant topic model [J] Pattern Recognition and Artificial Intelligence, 2019, 32 (08): 736 - 745.

CONROY J M, O'LEARY D P. Text summarization via hidden markov models[C]//The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 406 - 407.

V Vapnik, C Cortes, Support-vector networks, Machine leaning 20 (3), 273 - 297.

K. Sarkar, "Sentence Clustering-based Summarization of Multiple Text Documents," TECHNIA – International Journal of Computing Science and Communication Technologies, vol. 2, 2009.

Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate [J]. Computer Science, 2014.

Cho K, van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1724 - 1734.

Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 379 – 389.

Hochreiter S, Schmid Huber J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735 - 1780.

Zhao Hong An overview of deep learning methods for generative automatic summarization [J] Journal of Information Science, 2020, 39 (3): 15.

Chopra S, Auli M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks [C]//Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL-HLT, 2016: 93 - 98.

Nallapati R, Zhou B, Gulcehre C, et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond [C]//Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Co NLL, 2016: 280 - 290.

A. See, P. J. Liu, D. C. Manning. Get to the point: Summarization with pointer-generator networks [J]. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, 1073 - 1083.

Abadi M, Barham P, Chen J, et al. Tensor flow: a system for large-scale machine learning [C]//The Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. OSDI, 2016: 265 - 283.

L. Yu, W. Zhang, J. Wang, et al. Seqgan: Sequence generative adversarial-nets with policy gradient [C]. Thirty-First AAAI Conference on Artificial Intelligence, 2017.

A. Jadhav, V. Rajan. Extractive summarization with swap-net: Sentences and words from alternating pointer networks [C]. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, 142 - 151.

Vaswani A, Shazeer N, Parmar N et al. Attention is All You Need. Advances in Neural Information Processing Systems, 2017: 5998 - 6008.

PETERSME, NEUMANN M, IYYERM, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North America-n Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 2227 - 2237.

RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pretraining [J]. 2018.

DEVLIN J, CHANG M W, LEE K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171 - 4186.

ZHANG J Q, ZHAO Y, SALEH M, et al. Pegasus: Pretraining with extracted gap-sentences for abstractive summarization[C]//Proceedings of the 37th International Conference on Machine Learning. 2020: 11328 - 11339.

SONG K T, TAN X, QIN T, et al. Mass: Masked sequence to sequence pretraining for language generation [J]. ar Xiv preprint ar Xiv: 1905. 02450, 2019.

LIU Y. Fine-tune BERT for extractive summarization [J]. ar Xiv preprint ar Xiv: 1903. 10318, 2019.

DONG L, YANG N, WANG W H, et al. Unified language model pretraining for natural language understanding and generation [J]. ar Xiv preprint ar Xiv: 1905. 03197, 2019.

LEWIS M, LIU Y H, GOYAL N, et al. BART: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2019.

LIN C Y. Rouge: A package for automatic evaluation of summaries [C]// Text Summarization Branches Out. 2004: 74 - 81.

Chen Wei, Yang Yan Pointer network-based extraction generative summary generation model [J] Computer Applications, 2021, 41 (12): 3527-3533.

Chen Y C, Bansal M. Fast abstractive summarization with reinforce selected sentence rewriting [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 675 - 686.

A survey of research methods of automatic text summarization

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications