Exploring The Impact of Feature Engineering and Data Organization on Sentiment Analysis of Twitter Data Using Machine Learning Algorithms
DOI:
https://doi.org/10.54097/j597br86Keywords:
Twitter Sentiment Analysis; Machine Learning; Feature Engineering.Abstract
As one of the most popular micro-blogging platforms, Twitter generates millions of tweets daily, making manual sentiment analysis of such large volumes impractical. Consequently, leveraging machine learning algorithms for efficient sentiment analysis has become a critical challenge. This paper explores the performance of four machine learning algorithms—Logistic Regression (LR), Gaussian Naive Bayes (GNB), Decision Tree (DT), and Gradient Boosting Machine (GBM)—across four datasets of varying sizes. The models were trained using four feature extractors which include unigrams, bigrams, a combination of unigrams and bigrams, and the pre-trained Global Vectors (GloVe) word embedding model, with feature dimensions of 100, 200, and 300. The study reveals the impact of dataset size and feature combinations on the performance of these algorithms, identifying the most effective feature extraction methods. These findings provide valuable insights into the relationship between data scale, feature representation, and algorithmic performance, offering innovative perspectives for future research in sentiment analysis based on tweets.
Downloads
References
[1] Krommyda Maria, Rigos Anastasios, Bouklas Kostas, et al. An experimental analysis of data annotation methodologies for emotion detection in short text posted on social media. Informatics. 2021, 8(1): 19-33.
[2] Habib Mohammad W, Zainab N Sultani. Twitter sentiment analysis using different machine learning and feature extraction techniques. Al-Nahrain Journal of Science, 2021, 24(3): 50-54.
[3] Manda Kundan Reddy. Sentiment Analysis of Twitter Data Using Machine Learning and Deep Learning Methods. 2019.
[4] Gupta Bhumika, Negi Monika, Vishwakarma Kanika, et al. Study of Twitter sentiment analysis using machine learning algorithms on Python. International Journal of Computer Applications, 2017, 165(9): 29-34.
[5] Le Bac, Huy Nguyen. Twitter sentiment analysis using machine learning techniques. Advanced Computational Methods for Knowledge Engineering: Proceedings of 3rd International Conference on Computer Science, Applied Mathematics and Applications. Springer International Publishing, 2015: 279-289.
[6] Ahmad Munir, Shabib Aftab. Analyzing the performance of SVM for polarity detection with different datasets. International Journal of Modern Education and Computer Science, 2017, 9(10): 29-36.
[7] Pang Bo, Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in information retrieval, 2008, 2(1–2): 1-135.
[8] Saif Hassan, He Yulan, Fernandez Miriam, et al. Contextual semantics for sentiment analysis of Twitter. Information Processing & Management, 2016, 52(1): 5-19.
[9] Ahmad Munir, Aftab Shabib, Ali Iftikhar, et al. Hybrid tools and techniques for sentiment analysis: a review. International Journal of Multidisciplinary Sciences and Engineering, 2017, 8(3): 29-33.
[10] Ahmad Munir, Aftab Shabib, Muhammad Syed Shah, et al. Machine learning techniques for sentiment analysis: A review. International Journal of Multidisciplinary Sciences and Engineering, 2017, 8(3): 27-32.
[11] Go Alec, Richa Bhayani, Lei Huang. Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 2009, 1(12): 2009.
[12] GloVe: Global Vectors for Word Representation. URL: https://nlp.stanford.edu/projects/glove/. Last Accessed 2024/10/18.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







