Word-Level Interpretation of Chatgpt Detector Based on Classification Contribution

Dekun Chen

doi:10.54097/hset.v70i.12204

Authors

Dekun Chen

DOI:

https://doi.org/10.54097/hset.v70i.12204

Keywords:

ChatGPT Detector, Word-Level Interpretation, Classification Contribution.

Abstract

The ChatGPT detector is considered a necessary task to standardize the use of ChatGPT. Difficulty interpreting the test process and results is a common problem with LLM. Most existing interpreters focus on attention visualization and rarely consider the classification process. This study presents a method to show the contribution of words to model predictions. Specifically, this study considers information from classification weight vectors, semantic vectors, and embedded input word vectors for a more complete interpretation of detector LLM. Three word-level attributes (word length, part of speech and word meaning) are compared with the conclusions of existing literatures to verify our method. Visual samples and analysis process can be found at https://github.com/salixc/WCC-DekunChen.

Downloads

Download data is not yet available.

References

Jesse Vig. A multiscale visualization of attention in the transformer model. arXiv preprint arXiv:1906.05714, 2019.

Raymond Li, Wen Xiao, Lanjun Wang, Hyeju Jang, and Giuseppe Carenini. T3-vis: a vi- sual analytic framework for training and fine-tuning transformers in nlp. arXiv preprint arXiv:2108.13587, 2021.

Sebastian Gehrmann, Hendrik Strobelt, and Alexander M Rush. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043, 2019.

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023.

Alan Truly. Gptzero: how to use the chatgpt detection tool, February 2023. https://www. digitaltrends.com/computing/gptzero-how-to-detect-chatgpt-plagiarism/#dt-heading-how-does-gptzero-work, Last accessed on 2023-5- 1.

Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue and Yupeng Wu. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597, 2023.

OpenAI. Ai text classifier supported by openai, January 2023. https://beta.openai.com/ ai-text-classifier, Last accessed on 2023-5-2.

Benjamin Hoover, Hendrik Strobelt, and Sebastian Gehrmann. exbert: A visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276,2019.

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the- art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

Zhenguang G Cai, David A Haslett, Xufeng Duan, Shuqi Wang, and Martin J Pickering. Does chatgpt resemble humans in language use? arXiv preprint arXiv:2303.08014, 2023.

William Nagy and Dianna Townsend. Words as tools: Learning academic vocabulary as language acquisition. Reading research quarterly, 47(1):91– 108, 2012.

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani, et al. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec, volume 10, pages 2200– 2204, 2010.