A Comparative Analysis of Machine Learning Models for Predicting Student Performance: Evaluating the Impact of Stacking and Traditional Methods
DOI:
10.47709/brilliance.v4i2.4669Keywords:
Student performance prediction, Machine learning models, Ensemble learning, Stacking regressor, Educational data miningDimension Badge Record
Abstract
This study investigates the application of machine learning models to predict student performance using socio-economic, demographic, and academic factors. Various models were developed and evaluated, including Linear Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, Support Vector Regressor, and a Stacking Regressor. The models were assessed using key evaluation metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R-squared (????2), Mean Squared Log Error (MSLE), and Mean Absolute Percentage Error (MAPE). The Support Vector Regressor demonstrated the best overall performance, with an MAE of 4.3091, RMSE of 5.4110, and an ????2 of 0.8685, surpassing even the more complex ensemble models. Similarly, Linear Regression achieved strong results, with an MAE of 4.3154 and ????2 of 0.8685. In contrast, the Stacking Regressor, while effective, did not significantly outperform its base models, achieving an MAE of 4.5340 and ????2 of 0.8563, highlighting that greater model complexity does not necessarily lead to better predictive power. The analysis also revealed that MAPE was highly sensitive to outliers in the dataset, indicating the need for robust data preprocessing to handle extreme values. These results suggest that, in educational data mining, simpler models can often match or exceed the performance of more complex methods. Future research should investigate advanced ensembling strategies and feature engineering techniques to further enhance the accuracy and reliability of student performance predictions.
Abstract viewed = 38 times
References
Alawsi, M. A., Zubaidi, S. L., Al-Bdairi, N. S. S., Al-Ansari, N. & Hashim, K. (2022). Drought forecasting: a review and assessment of the hybrid techniques and data pre-processing. Hydrology, 9(7), 115.
Badal, Y. T. & Sungkur, R. K. (2023). Predictive modelling and analytics of students’ grades using machine learning algorithms. Education and Information Technologies, 28(3), 3027–3057.
Bento, P. M. R., Pombo, J. A. N., Calado, M. R. A. & Mariano, S. J. P. S. (2021). Stacking ensemble methodology using deep learning and ARIMA models for short-term load forecasting. Energies, 14(21), 7378.
Blanden, J. (2020). Education and inequality. In The economics of education (pp. 119–131). Elsevier.
Cohen, J. F. W., Hecht, A. A., McLoughlin, G. M., Turner, L. & Schwartz, M. B. (2021). Universal school meals and associations with student participation, attendance, academic performance, diet quality, food security, and body mass index: a systematic review. Nutrients, 13(3), 911.
D’iez, F., Villa, A., López, A. L. & Iraurgi, I. (2020). Impact of quality management systems in the performance of educational centers: educational policies and management processes. Heliyon, 6(4).
Davids, R. (2020). The nature of parental involvement in literacy activities of low achieving learners in disadvantaged contexts at a selected primary school in the Western Cape.
Domladovac, M. (2021). Comparison of neural network with gradient boosted trees, random forest, logistic regression and SVM in predicting student achievement. 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), 211–216.
Goshin, M., Dubrov, D., Kosaretsky, S. & Grigoryev, D. (2021). The strategies of parental involvement in adolescents’ education and extracurricular activities. Journal of Youth and Adolescence, 50(5), 906–920.
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., Sarker, K. U. & Sattar, M. U. (2020). Predicting student performance in higher educational institutions using video learning analytics and data mining techniques. Applied Sciences, 10(11), 3894.
Hashim, A. S., Awadh, W. A. & Hamoud, A. K. (2020). Student performance prediction model based on supervised machine learning algorithms. IOP Conference Series: Materials Science and Engineering, 928(3), 32019.
Hooda, M., Rana, C., Dahiya, O., Rizwan, A. & Hossain, M. S. (2022). Artificial intelligence for assessment and feedback to enhance student success in higher education. Mathematical Problems in Engineering, 2022(1), 5215722.
Hultberg, P. T., Calonge, D. S. & Choi, T. (2021). Costs and benefits of private tutoring programs: the South Korean case. International Journal of Social Economics, 48(6), 862–877.
Kabathova, J. & Drlik, M. (2021). Towards predicting student’s dropout in university courses using different machine learning techniques. Applied Sciences, 11(7), 3130.
Ludeke, S. G., Gensowski, M., Junge, S. Y., Kirkpatrick, R. M., John, O. P. & Andersen, S. C. (2021). Does parental education influence child educational outcomes? A developmental analysis in a full-population sample and adoptee design. Journal of Personality and Social Psychology, 120(4), 1074.
Marks, G. N. & O’Connell, M. (2021). Inadequacies in the SES--Achievement model: Evidence from PISA and other studies. Review of Education, 9(3), e3293.
Ofori, F., Maina, E. & Gitonga, R. (2020). Using machine learning algorithms to predict students’ performance and improve learning outcome: A literature based review. Journal of Information and Technology, 4(1), 33–55.
Razzaq, H. A. R., Majeed, M. F. M. & Bajwa, M. S. A. B. (2024). Relationship between Socioeconomic Status and Academic Achievements. UCP Journal of Humanities & Social Sciences (HEC Recognized-Y Category), 2(2), 73–88.
Roberts, M., Tolar-Peterson, T., Reynolds, A., Wall, C., Reeder, N. & Rico Mendez, G. (2022). The effects of nutritional interventions on the cognitive development of preschool-age children: a systematic review. Nutrients, 14(3), 532.
Rodr’iguez-Hernández, C. F., Cascallar, E. & Kyndt, E. (2020). Socio-economic status and academic performance in higher education: A systematic review. Educational Research Review, 29, 100305.
Strelan, P., Osborn, A. & Palmer, E. (2020). The flipped classroom: A meta-analysis of effects on student performance across disciplines and education levels. Educational Research Review, 30, 100314.
Syed Mustapha, S. (2023). Predictive analysis of students’ learning performance using data mining techniques: A comparative study of feature selection methods. Applied System Innovation, 6(5), 86.
Tan, C. Y., Lyu, M. & Peng, B. (2020). Academic benefits from parental involvement are stratified by parental socioeconomic status: A meta-analysis. Parenting, 20(4), 241–287.
Villar, A. & de Andrade, C. R. V. (2024). Supervised machine learning algorithms for predicting student dropout and academic success: a comparative study. Discover Artificial Intelligence, 4(1), 2.
Xue, H. & Niu, Y. (2023). Multi-output based hybrid integrated models for student performance prediction. Applied Sciences, 13(9), 5384.
Yaugci, M. (2022). Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11.
Yu, F. & Liu, X. (2022). Research on student performance prediction based on stacking fusion model. Electronics, 11(19), 3166.
Downloads
ARTICLE Published HISTORY
How to Cite
Issue
Section
License
Copyright (c) 2024 Gregorius Airlangga
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.