Open Chinese Internet Sarcasm Corpus Construction: An Approach
DOI:
https://doi.org/10.54097/fcis.v2i1.2484Keywords:
Sarcasm, Open corpus, Chinese InternetAbstract
Sarcasm is a commonly-used language phenomenon particularly on the Internet, which is often to convey criticism or negative emotions. A proper sarcasm corpus to help sarcasm study and detection can contribute to linguistic research and assist sentiment analysis, but an open Chinese corpus is found extremely lacking. In this paper, we referenced existing methods and data and constructed a balanced open Chinese Internet sarcasm corpus in a new approach to improve efficiency and data quality. The balanced open corpus contains multi-source and labeled 2,000 texts selected from bigger corresponding origin datasets. In our corpus, sarcasm and non-sarcasm, longer and shorter texts are both in 1:1 ratio.
Downloads
References
Cai, Y., Cai, H., & Wan, X. (2019, July). Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 2506-2515).
Khodak, M., Saunshi, N., & Vodrahalli, K. (2017). A large self-annotated corpus for sarcasm. arXiv preprint arXiv:1704.05579.
Gong, X., Zhao, Q., Zhang, J., Mao, R., & Xu, R. (2020, May). The design and construction of a Chinese sarcasm dataset. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 5034-5039).
Li, A. R., & Huang, C. R. (2020). A Method of Modern Chinese Irony Detection. In From Minimal Contrast to Meaning Construct (pp. 273-288). Springer, Singapore.
Tang, Y. J., & Chen, H. H. (2014, August). Chinese irony corpus construction and ironic structure analysis. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 1269-1278).
Misra, R., & Arora, P. (2019). Sarcasm detection using hybrid neural network. arXiv preprint arXiv:1908.07414.
Sun, X., He, J., & Ren, F. (2016). Pragmatic analysis of irony based on hybrid neural network model with multi-feature. Journal of Chinese Information Processing, 30(6), 215.
Lu, X., Li. Y., & Wang S. (2019). Linguistic Features Enhanced Convolutional Neural Networks for Irony Recognition. Journal of Chinese Information Processing, 33(5), 31-38.


