Evaluating ChatGPT: Strengths and Limitations in NLP Problem Solving
DOI:
https://doi.org/10.54097/z15ne349Keywords:
ChatGPT; GPT-3.5; natural language processing; zero-shot learning.Abstract
This paper critically analyzes ChatGPT’s problem-solving performance on a range of natural language processing (NLP) tasks. Using a comparative methodology, it compares ChatGPT’s performance with that of its predecessor, GPT-3.5, in seven different domains: summarization, named entity recognition, arithmetic, natural language inference, symbolic and logical reasoning, question answering, conversation, and arithmetic. The process entails a methodical assessment highlighting ChatGPT’s replies’ quantitative and qualitative elements. The findings show that although ChatGPT performs very well on math and question-answering tasks, it struggles with summarization and commonsense reasoning. The conversation sheds light on the subtleties of these findings while considering the applications and development of AI. The article concludes that although ChatGPT is a significant progress in natural language processing, its uneven problem-solving performance highlights the need for continued development and optimization of artificial intelligence models. This study aids in understanding the state and promise of AI-driven language models in challenging problem-solving situations.
Downloads
References
Ray, Partha Pratim. ChatGPT: A Comprehensive Review on Background, Applications, Key Challenges, Bias, Ethics, Limitations and Future Scope. Internet of Things and Cyber-Physical Systems, 2023, 121–154.
Molly, Ruby. How ChatGPT Works: The Model Behind the Bot. Towards Data Science, Working paper, 2023.
Devlin, Jacob, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Cornell University, 2018.
Christiano, Paul F, et al. Deep Reinforcement Learning From Human Preferences. Neural Information Processing Systems, 2017, 4299–4307.
Ouyang Long, et al. Training Language Models to Follow Instructions With Human Feedback. Working paper, 2022.
Guo Biyang, et al. How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. Cornell University, 2023.
Orrù Graziella, et al. Human-like Problem-solving Abilities in Large Language Models Using ChatGPT. Frontiers in Artificial Intelligence, 2023.
Qin Chengwei, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? arXiv Cornell University, 2023.
BIG-bench collaboration BIG-bench collaboration. 2021. Beyond the imitation game: Measuring and extrapolating the capabilities of language models, 2022. https://github.com/google/BIG-bench.
Wei Jason, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Cornell University, 2022.
Xiaolong Wang, Yufei Ye, Abhinav Gupta. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 6857–6866.
Zhang Li, et al. Learning a Deep Embedding Model for Zero-Shot Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 2021–2030.
Brown T B, et al. Language Models Are Few-Shot Learners. Cornell University, 2020.
Qin Chengwei, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Cornell University, 2023.
Kojima Takeshi, et al. Large Language Models Are Zero-Shot Reasoners. Cornell University, 2022.
Zhang Zhuosheng, et al. Multimodal Chain-of-Thought Reasoning in Language Models. Cornell University, 2023.
Lü Ping, et al. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. Cornell University, 2022.
Clark Christopher, et al. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. Cornell University, 2019.
Kocoń Jan, et al. ChatGPT: Jack of All Trades, Master of None. Information Fusion, 2023.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







