Evaluating ChatGPT: Strengths and Limitations in NLP Problem Solving

Authors

  • Yanyi Wu

DOI:

https://doi.org/10.54097/z15ne349

Keywords:

ChatGPT; GPT-3.5; natural language processing; zero-shot learning.

Abstract

This paper critically analyzes ChatGPT’s problem-solving performance on a range of natural language processing (NLP) tasks. Using a comparative methodology, it compares ChatGPT’s performance with that of its predecessor, GPT-3.5, in seven different domains: summarization, named entity recognition, arithmetic, natural language inference, symbolic and logical reasoning, question answering, conversation, and arithmetic. The process entails a methodical assessment highlighting ChatGPT’s replies’ quantitative and qualitative elements. The findings show that although ChatGPT performs very well on math and question-answering tasks, it struggles with summarization and commonsense reasoning. The conversation sheds light on the subtleties of these findings while considering the applications and development of AI. The article concludes that although ChatGPT is a significant progress in natural language processing, its uneven problem-solving performance highlights the need for continued development and optimization of artificial intelligence models. This study aids in understanding the state and promise of AI-driven language models in challenging problem-solving situations.

Downloads

Download data is not yet available.

References

Ray, Partha Pratim. ChatGPT: A Comprehensive Review on Background, Applications, Key Challenges, Bias, Ethics, Limitations and Future Scope. Internet of Things and Cyber-Physical Systems, 2023, 121–154.

Molly, Ruby. How ChatGPT Works: The Model Behind the Bot. Towards Data Science, Working paper, 2023.

Devlin, Jacob, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Cornell University, 2018.

Christiano, Paul F, et al. Deep Reinforcement Learning From Human Preferences. Neural Information Processing Systems, 2017, 4299–4307.

Ouyang Long, et al. Training Language Models to Follow Instructions With Human Feedback. Working paper, 2022.

Guo Biyang, et al. How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. Cornell University, 2023.

Orrù Graziella, et al. Human-like Problem-solving Abilities in Large Language Models Using ChatGPT. Frontiers in Artificial Intelligence, 2023.

Qin Chengwei, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? arXiv Cornell University, 2023.

BIG-bench collaboration BIG-bench collaboration. 2021. Beyond the imitation game: Measuring and extrapolating the capabilities of language models, 2022. https://github.com/google/BIG-bench.

Wei Jason, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Cornell University, 2022.

Xiaolong Wang, Yufei Ye, Abhinav Gupta. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 6857–6866.

Zhang Li, et al. Learning a Deep Embedding Model for Zero-Shot Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 2021–2030.

Brown T B, et al. Language Models Are Few-Shot Learners. Cornell University, 2020.

Qin Chengwei, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Cornell University, 2023.

Kojima Takeshi, et al. Large Language Models Are Zero-Shot Reasoners. Cornell University, 2022.

Zhang Zhuosheng, et al. Multimodal Chain-of-Thought Reasoning in Language Models. Cornell University, 2023.

Lü Ping, et al. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. Cornell University, 2022.

Clark Christopher, et al. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. Cornell University, 2019.

Kocoń Jan, et al. ChatGPT: Jack of All Trades, Master of None. Information Fusion, 2023.

Downloads

Published

26-04-2024

How to Cite

Wu, Y. (2024). Evaluating ChatGPT: Strengths and Limitations in NLP Problem Solving. Highlights in Science, Engineering and Technology, 94, 319-325. https://doi.org/10.54097/z15ne349