Evaluating TF-IDF With Logistic Regression and BERT Fine-Tuning for Movie Review Sentiment Analysis

Bowen Guo

doi:10.54097/bs3z2870

Authors

Bowen Guo School of Engineering, Architecture and Information Technology, University of Queensland, Brisbane, Australia

DOI:

https://doi.org/10.54097/bs3z2870

Keywords:

Sentiment Analysis; TF-IDF; Logistic Regression; BERT.

Abstract

Understanding how people express opinions online is crucial for application such as public sentiment analysis social media monitoring and opinion-based decision making, yet sentiment analysis still faces difficulties due to the flexible, diverse, and often ambiguous use of language in user-generated content. In this study, we compare two distinct approaches: a conventional TF-IDF model combined with Logistic Regression, and a neural approach that fine-tunes BERT. Experiments were performed on the IMDb dataset containing 50,000 movie reviews, which is evenly split between positive and negative samples. We compared a simple TF-IDF model with logistic regression against a fine-tuned BERT using the AdamW optimiser and early stopping. The BERT model gave a slightly higher Macro-F1 (0.9146 ± 0.0041) than the baseline (0.9084). The improvement seems to come from its better handling of subtle or implied sentiment. Still, the traditional model remains surprisingly strong on a balanced dataset, suggesting that simpler models can still be valuable for sentiment analysis tasks.

Downloads

Download data is not yet available.

References

[1] K. Alahmadi, S. Alharbi, J. Chen, and X. Wang, “Generalizing sentiment analysis: a review of progress, challenges, and emerging directions,” Soc. Netw. Anal. Min., vol. 15, no. 1, p. 45, Apr. 2025, doi: 10.1007/s13278-025-01461-8.

[2] K. Barik and S. Misra, “Analysis of customer reviews with an improved VADER lexicon classifier,” J. Big Data, vol. 11, no. 1, p. 10, Jan. 2024, doi: 10.1186/s40537-023-00861-x.

[3] A. Hussain and E. Cambria, “Semi-supervised learning for big social data analysis,” Neurocomputing, vol. 275, pp. 1662–1673, Jan. 2018, doi: 10.1016/j.neucom.2017.10.010.

[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. in Proc. NAACL-HLT (North American Chapter of the Association for Computational Linguistics: Human Language Technologies), Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186.

[5] S. Alaparthi and M. Mishra, “BERT: a sentiment analysis odyssey,” J. Mark. Anal., vol. 9, no. 2, pp. 118–126, June 2021, doi: 10.1057/s41270-021-00109-8.

[6] “IMDB Dataset of 50K Movie Reviews.” Accessed: Sept. 15, 2025. [Online]. Available: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

[7] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manag., vol. 24, no. 5, pp. 513–523, Jan. 1988, doi: 10.1016/0306-4573(88)90021-0.

[8] A. Vaswani et al., “Attention is All you Need”. in Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, Dec. 2017, pp. 5998–6008.

[9] I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” Jan. 04, 2019, arXiv: arXiv:1711.05101. doi: 10.48550/arXiv.1711.05101.

[10] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” May 28, 2002, arXiv: arXiv:cs/0205070. doi: 10.48550/arXiv.cs/0205070.

[11] X. Lin, M. Kelly, J.-R. Park, K. Seki, and W. Ke, “Author’s Name: Sonia Manalo Pascua Submission Date: 06/19/2025”.

[12] U. Onyekpe, V. Palade, and M. A. Wani, Recent Advances in Deep Learning Applications: New Techniques and Practical Examples, 1st ed. Boca Raton: Chapman and Hall/CRC, 2025. doi: 10.1201/9781003570882.

[13] M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, pp. 427–437, July 2009, doi: 10.1016/j.ipm.2009.03.002.

[14] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning Word Vectors for Sentiment Analysis”.

[15] P. K. Kumaresan, R. Ponnusamy, R. Priyadharshini, P. Buitelaar, and B. R. Chakravarthi, “Homophobia and transphobia detection for low-resourced languages in social media comments,” Nat. Lang. Process. J., vol. 5, p. 100041, 2023, doi: 10.1016/j.nlp.2023.100041.

[16] Z. Li, Y. Zou, C. Zhang, Q. Zhang, and Z. Wei, “Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training,” Nov. 03, 2021, arXiv: arXiv:2111.02194. doi: 10.48550/arXiv.2111.02194.

[17] A. M. Van Der Veen and E. Bleich, “The advantages of lexicon-based sentiment analysis in an age of machine learning,” PLOS ONE, vol. 20, no. 1, p. e0313092, Jan. 2025, doi: 10.1371/journal.pone.0313092.