Sentiment Analysis on Short Social Media Texts Using DistilBERT

Authors

  • Muhammad Sidik Asyaky Informatics, Siliwangi University, Indonesia
  • Muhammad Al-Husaini Informatics, Siliwangi University, Indonesia
  • Hen Hen Lukmana Informatics, Siliwangi University, Indonesia

DOI:

https://doi.org/10.47709/cnahpc.v7i2.5836

Keywords:

DistilBERT, Performance Evaluation, Sentiment Analysis, Short Informal Text, Social Media, Tweet

Abstract

Sentiment analysis on short texts from social media, such as tweets, presents unique challenges due to their brevity and informal language. This study explores the effectiveness of transformer-based models, particularly DistilBERT, in performing sentiment analysis on short texts compared to traditional machine learning approaches including Support Vector Machine, Logistic Regression, and Naive Bayes. The objective is to assess whether DistilBERT not only enhances sentiment classification accuracy but also remains efficient enough for quick social media analysis. The models used in this study were trained and evaluated on stratified samples of 10,000, 30,000, and 50,000 tweets, drawn from the Sentiment140 dataset while preserving the original class distribution. The methodology involved data collection and sampling, data splitting, data cleaning, feature extraction, model training, and evaluation using accuracy and F1-score. Experimental results showed that DistilBERT consistently outperformed traditional models in both accuracy and F1-score, and demonstrated competitive results against BERT while requiring significantly less training time. Specifically, DistilBERT trained approximately 1.8 times faster than BERT on average, highlighting its computational efficiency. The best result was achieved by DistilBERT trained on the 50k subset, reaching an accuracy of 85% and an F1-score of 84%. These findings suggest that lightweight transformer models like DistilBERT are highly suitable for real-world sentiment analysis tasks where both speed and performance are critical.

Downloads

Download data is not yet available.

References

Abei, F., Sulaeman, A. A., & Suprapto, S. (2025). Twitter Sentiment Towards 2024 Jakarta Governor Candidates With Naïve Bayes Algorithm. Journal of Computer Networks, Architecture and High Performance Computing, 7(1), 265–277. https://doi.org/10.47709/cnahpc.v7i1.5358

Barreto, S., Moura, R., Carvalho, J., Paes, A., & Plastino, A. (2022). Sentiment analysis in tweets: an assessment study from classical to modern text representation models. Data Mining and Knowledge Discovery. http://arxiv.org/abs/2105.14373

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805

Fransiscus, & Girsang, A. S. (2022). Sentiment Analysis of COVID-19 Public Activity Restriction (PPKM) Impact using BERT Method. International Journal of Engineering Trends and Technology, 70(12), 281–288. https://doi.org/10.14445/22315381/IJETT-V70I12P226

Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., & Mridha, M. F. (2024). Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal, 6, 100059. https://doi.org/10.1016/j.nlp.2024.100059

Jusli, D. T. A., & Kurniawan, R. (2024). Analysis Of Opinion Sentiment Towards Electric Vehicle Tax On Social Media X Using The Support Vector Machine (SVM) Method. Journal of Computer Networks, Architecture and High Performance Computing, 6(4), 1792–1808. https://doi.org/10.47709/cnahpc.v6i4.4739

Karl, F., & Scherp, A. (2023). Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets. Machine Learning and Knowledge Extraction. https://doi.org/https://doi.org/10.1007/978-3-031-40837-3_7

Kazanova. (2017). Sentiment140 dataset with 1.6 million tweets. Kaggle. https://www.kaggle.com/datasets/kazanova/sentiment140

Khan, J., Ahmad, K., Jagatheesaperumal, S. K., & Sohn, K. A. (2025). Textual variations in social media text processing applications: challenges, solutions, and trends. Artificial Intelligence Review, 58(3). https://doi.org/10.1007/s10462-024-11071-z

Knittel, J., Koch, S., Tang, T., Chen, W., Wu, Y., Liu, S., & Ertl, T. (2021). Real-Time Visual Analysis of High-Volume Social Media Posts. IEEE Transactions on Visualization and Computer Graphics, 879–889. https://doi.org/10.1109/TVCG.2021.3114800

Loureiro, D., Rezaee, K., Pilehvar, M. T., & Camacho-Collados, J. (2021). Analysis and Evaluation of Language Models for Word Sense Disambiguation under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. Computational Linguistics, 47(2). https://doi.org/10.1162/COLI

Mao, Y., Liu, Q., & Zhang, Y. (2024). Sentiment analysis methods, applications, and challenges: A systematic literature review. In Journal of King Saud University - Computer and Information Sciences (Vol. 36, Issue 4). King Saud bin Abdulaziz University. https://doi.org/10.1016/j.jksuci.2024.102048

Nguyen, T. H., Nguyen, H. H., Ahmadi, Z., Hoang, T. A., & Doan, T. N. (2021). On the Impact of Dataset Size:A Twitter Classification Case Study. ACM International Conference Proceeding Series, 210–217. https://doi.org/10.1145/3486622.3493960

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. EMC. http://arxiv.org/abs/1910.01108

Sarkar, S., Babar, M. F., Hassan, M. M., Hasan, M., & Santu, S. K. K. (2024). Processing Natural Language on Embedded Devices: HowWell Do Transformer Models Perform? ICPE 2024 - Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 211–222. https://doi.org/10.1145/3629526.3645054

Setiawan, Y., Maulidevi, N. U., & Surendro, K. (2024). The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification. Data Science Journal, 23(1). https://doi.org/10.5334/dsj-2024-031

Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers. Information Systems, 121. https://doi.org/10.1016/j.is.2023.102342

Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. In Applied Sciences (Switzerland) (Vol. 13, Issue 7). MDPI. https://doi.org/10.3390/app13074550

Downloads

Published

2025-05-10

How to Cite

Asyaky, M. S., Muhammad Al-Husaini, & Hen Hen Lukmana. (2025). Sentiment Analysis on Short Social Media Texts Using DistilBERT. Journal of Computer Networks, Architecture and High Performance Computing, 7(2), 524–533. https://doi.org/10.47709/cnahpc.v7i2.5836

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.