Sentiment Analysis on Short Social Media Texts Using DistilBERT
DOI:
https://doi.org/10.47709/cnahpc.v7i2.5836Keywords:
DistilBERT, Performance Evaluation, Sentiment Analysis, Short Informal Text, Social Media, TweetAbstract
Sentiment analysis on short texts from social media, such as tweets, presents unique challenges due to their brevity and informal language. This study explores the effectiveness of transformer-based models, particularly DistilBERT, in performing sentiment analysis on short texts compared to traditional machine learning approaches including Support Vector Machine, Logistic Regression, and Naive Bayes. The objective is to assess whether DistilBERT not only enhances sentiment classification accuracy but also remains efficient enough for quick social media analysis. The models used in this study were trained and evaluated on stratified samples of 10,000, 30,000, and 50,000 tweets, drawn from the Sentiment140 dataset while preserving the original class distribution. The methodology involved data collection and sampling, data splitting, data cleaning, feature extraction, model training, and evaluation using accuracy and F1-score. Experimental results showed that DistilBERT consistently outperformed traditional models in both accuracy and F1-score, and demonstrated competitive results against BERT while requiring significantly less training time. Specifically, DistilBERT trained approximately 1.8 times faster than BERT on average, highlighting its computational efficiency. The best result was achieved by DistilBERT trained on the 50k subset, reaching an accuracy of 85% and an F1-score of 84%. These findings suggest that lightweight transformer models like DistilBERT are highly suitable for real-world sentiment analysis tasks where both speed and performance are critical.
Downloads
References
Abei, F., Sulaeman, A. A., & Suprapto, S. (2025). Twitter Sentiment Towards 2024 Jakarta Governor Candidates With Naïve Bayes Algorithm. Journal of Computer Networks, Architecture and High Performance Computing, 7(1), 265–277. https://doi.org/10.47709/cnahpc.v7i1.5358
Barreto, S., Moura, R., Carvalho, J., Paes, A., & Plastino, A. (2022). Sentiment analysis in tweets: an assessment study from classical to modern text representation models. Data Mining and Knowledge Discovery. http://arxiv.org/abs/2105.14373
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805
Fransiscus, & Girsang, A. S. (2022). Sentiment Analysis of COVID-19 Public Activity Restriction (PPKM) Impact using BERT Method. International Journal of Engineering Trends and Technology, 70(12), 281–288. https://doi.org/10.14445/22315381/IJETT-V70I12P226
Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., & Mridha, M. F. (2024). Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal, 6, 100059. https://doi.org/10.1016/j.nlp.2024.100059
Jusli, D. T. A., & Kurniawan, R. (2024). Analysis Of Opinion Sentiment Towards Electric Vehicle Tax On Social Media X Using The Support Vector Machine (SVM) Method. Journal of Computer Networks, Architecture and High Performance Computing, 6(4), 1792–1808. https://doi.org/10.47709/cnahpc.v6i4.4739
Karl, F., & Scherp, A. (2023). Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets. Machine Learning and Knowledge Extraction. https://doi.org/https://doi.org/10.1007/978-3-031-40837-3_7
Kazanova. (2017). Sentiment140 dataset with 1.6 million tweets. Kaggle. https://www.kaggle.com/datasets/kazanova/sentiment140
Khan, J., Ahmad, K., Jagatheesaperumal, S. K., & Sohn, K. A. (2025). Textual variations in social media text processing applications: challenges, solutions, and trends. Artificial Intelligence Review, 58(3). https://doi.org/10.1007/s10462-024-11071-z
Knittel, J., Koch, S., Tang, T., Chen, W., Wu, Y., Liu, S., & Ertl, T. (2021). Real-Time Visual Analysis of High-Volume Social Media Posts. IEEE Transactions on Visualization and Computer Graphics, 879–889. https://doi.org/10.1109/TVCG.2021.3114800
Loureiro, D., Rezaee, K., Pilehvar, M. T., & Camacho-Collados, J. (2021). Analysis and Evaluation of Language Models for Word Sense Disambiguation under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. Computational Linguistics, 47(2). https://doi.org/10.1162/COLI
Mao, Y., Liu, Q., & Zhang, Y. (2024). Sentiment analysis methods, applications, and challenges: A systematic literature review. In Journal of King Saud University - Computer and Information Sciences (Vol. 36, Issue 4). King Saud bin Abdulaziz University. https://doi.org/10.1016/j.jksuci.2024.102048
Nguyen, T. H., Nguyen, H. H., Ahmadi, Z., Hoang, T. A., & Doan, T. N. (2021). On the Impact of Dataset Size:A Twitter Classification Case Study. ACM International Conference Proceeding Series, 210–217. https://doi.org/10.1145/3486622.3493960
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. EMC. http://arxiv.org/abs/1910.01108
Sarkar, S., Babar, M. F., Hassan, M. M., Hasan, M., & Santu, S. K. K. (2024). Processing Natural Language on Embedded Devices: HowWell Do Transformer Models Perform? ICPE 2024 - Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 211–222. https://doi.org/10.1145/3629526.3645054
Setiawan, Y., Maulidevi, N. U., & Surendro, K. (2024). The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification. Data Science Journal, 23(1). https://doi.org/10.5334/dsj-2024-031
Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers. Information Systems, 121. https://doi.org/10.1016/j.is.2023.102342
Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. In Applied Sciences (Switzerland) (Vol. 13, Issue 7). MDPI. https://doi.org/10.3390/app13074550
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Sidik Asyaky, Muhammad Al-Husaini, Hen Hen Lukmana

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.