Jacobo Farray Rodríguez -

not-yet-known not-yet-known not-yet-known unknown Evaluating open-ended questions is a common and time-consuming task in the education environment. With the continuous and rapid advances in Natural Language Processing (NLP), we have large language models (LLMs) which have been trained with large datasets. The objective of this study is to evaluate the use of these LLMs using retrieval augmented generation (RAG) techniques for the numerical evaluation of open-ended questions with answers of approximately 250 words, evaluating the improvement that the use of the RAG technique brings to this task. For this purpose, a dataset composed of 351 questions evaluated by the teachers of 2 courses, and also the course materials, has been used. We found that, in general, using the RAG technique improves the numerical grades awarded by a LLM, achieving improvements in the MAE of up to 24%. It was also observed that generally the LLM tends to give high grades. Our study concludes with practical guidelines for integrating RAG models into educational settings.