Miguel Arevalillo-Herráez


2024

pdf bib
Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models
Romina Soledad Albornoz-De Luise | David Arnau | Pablo Arnau-González | Miguel Arevalillo-Herráez
Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation

Despite not being explicitly trained for this purpose, models like Mistral and LLaMA have demonstrated impressive results across numerous tasks, including generating solutions to Mathematical Word Problems (MWPs). A MWP involves translating a textual description into a mathematical model or equation that solving it. However, these models face challenges in accurately interpreting and utilizing the numerical information present in the MWP statements, which can lead to errors in the generated solutions. To better understand the limitations of LLMs, we analyzed the MWP where models failed to accurately solve problems from the SVAMP dataset. By categorizing these MWPs, we identify specific types of problems where the models are most prone to errors, providing insights into the underlying challenges faced by LLMs in problem-solving scenarios and open new modeling opportunities. By understanding the expected errors, researchers can design strategies to adequately model problems more effectively and choose the most suitable LLM for solving them taking into account each model’s strengths and weaknesses.

pdf bib
Be My Mate: Simulating Virtual Students for collaboration using Large Language Models
Sergi Solera-Monforte | Pablo Arnau-González | Miguel Arevalillo-Herráez
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations

Advancements in machine learning, particularly Large Language Models (LLMs), offer new opportunities for enhancing education through personalized assistance. We introduce “Be My Mate,” an agent that leverages LLMs to simulate virtual peer students in online collaborative education. The system includes a subscription module for real-time updates and a conversational module for generating supportive interactions. Key challenges include creating temporally realistic interactions and credible error generation. The initial demonstration shows promise in enhancing student engagement and learning outcomes.