Predicción del abandono en estudiantes de nuevo ingreso en el curso de Matemática general utilizando algoritmos de aprendizaje supervisado

Zamora Araya, José Andrey2024-10-242024-10-242024-06-07978-956-6224-39-6https://hdl.handle.net/11056/29262Resumen. El objetivo de este trabajo fue determinar, entre tres opciones, cuál es el mejor algoritmo para predecir el abandono en estudiantes de primer ingreso del curso de Matemática General de la Universidad Nacional junto con la identificación de las principales variables asociadas a la predicción. Se utilizaron los algoritmos de Regresión Logística, Random Forest y XGBoost, la métrica de rendimiento elegida fue la F1 Score . El archivo de entrenamiento está constituido por la matrícula de los años 2017 y 2018 y el archivo de prueba por la matrícula del año 2019, los hiperparámetros de los algoritmos se ajustaron por medio de una validación cruzada 5 folds. Una vez ajustados los hiperparámetros se realizó una comparación de las medias de la métrica F1 con un ANOVA de medidas repetidas con validación cruzada 10 folds. Las variables más importantes para la predicción fueron la nota en la PAA, el IDS, la edad y sexo del estudiantado, edad y grado académico de la persona docente, el estrato educativo y la carrera. Los resultados muestran que no existen diferencias significativas en cuanto al rendimiento de los algoritmos (p =0,118) en la métrica seleccionada, por lo que se aconseja utilizar el algoritmo menos complejo que es el de Regresión Logística para interpretar los resultados.Abstract. The objective of this work was to determine, among three options, which is the best algorithm to predict dropout in first-year students of the General Mathematics course at the National University, together with the identification of the main variables associated with the prediction. The Logistic Regression, Random Forest and XGBoost algorithms were used, and the chosen performance metric was the F1 Score. The training file consists of the enrollment for the years 2017 and 2018 and the test file consists of the enrollment for the year 2019. The hyperparameters of the algorithms were adjusted by means of a 5-fold cross-validation. Once the hyperparameters were adjusted, a comparison of the means of the F1 metric was made with a repeated measures ANOVA with 10-fold cross-validation. The most important variables for the prediction were the grade in the PAA, the IDS, the age and sex of the student, the age and academic degree of the teacher, the educational stratum and the career. The results show that there are no significant differences in the performance of the algorithms (p = 0.118) in the selected metric, so it is advisable to use the less complex algorithm, which is Logistic Regression, to interpret the results.spaAcceso abiertoDESERCIÓN ESCOLAREDUCACIÓN SUPERIORENSEÑANZA DE LAS MATEMÁTICASUNIVERSIDAD NACIONAL (COSTA RICA)SCHOOL DESERTIONHIGHER EDUCATIONTEACHING MATHEMATICSMATEMÁTICASESTUDIANTESSTUDENTSAPRENDIZAJELEARNINGPredicción del abandono en estudiantes de nuevo ingreso en el curso de Matemática general utilizando algoritmos de aprendizaje supervisadohttp://purl.org/coar/resource_type/c_8544