Francisco F. López-Ponce

Also published as: Francisco Fernando Lopez-Ponce, Francisco López - Ponce, Francisco Lopez-ponce, Francisco López-Ponce, F. López-Ponce, Francisco Lopez-Ponce

2026

pdf bib abs

GIL-Zaragoza at SemEval 2026 Task 11: Comparing Classification, Autoformalization, and Ontologies for Formal Reasoning Capabilities
Francisco Lopez-Ponce | Lucia Pitarch | Iván Saavedra Martínez | Ignacio Huitzil | Sergio Ojeda Trueba | Fernando Bobillo | Gemma Bel-Enguix
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our participation in Task 11 of SemEval-2026, which evaluates the ability of models to determine logical validity of syllogisms independent of real-world content. We develop and compare three approaches for Subtask 1: (1) an encoder-based classification baseline using both classical ML methods and fine-tuned BERT with debiasing strategies; (2) an autoformalization pipeline combining DPO-aligned models with first order logic translation and formal inference via Prover9; and (3) a hybrid neuro-symbolic approach using GPT to generate OWL 2 ontologies evaluated with the HermiT reasoner. Our best result was achieved by the encoder-based classifier, obtaining a 72.25% accuracy and a combined score of 20.37, placing 40th out of 45 participating teams. Analysis shows that classification methods exhibit lower content bias, autoformalization approaches suffer from translation inconsistencies and syntax incompatibilities, and ontology-based reasoning is hindered by prompt design limitations and verbose serialization formats. All our code can be found in the paper’s repository.

2025

pdf bib abs

GIL-IIMAS UNAM at SemEval-2025 Task 3: MeSSI: A Multilmodule System to detect hallucinated Segments in trivia-like Inquiries.
Francisco López-Ponce | Karla Salas-Jimenez | Adrián Juárez-Pérez | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We present MeSSI, a multi-module system applied to SemEval 2025’s task 3: Mu-SHROOM. Our system tags questions in order to obtain semantic relevant terms that are used as information retrieval characteristics. Said characteristics serve as extraction terms for Wikipedia pages that are in turn processed to generate gold standard texts used in a hallucination evaluation system. A PoST-based entity comparison was implemented to contrast the test dataset sentences with the corresponding generated gold standards, wich in turn was the main criteria to tag hallucinations, partitioned in soft labels and hard labels. This method was tested in Spanish and English, finishing 18th and 19th respectively on the IoU based ranking.

pdf bib abs

GIL-IIMAS UNAM at SemEval-2025 Task 4: LA-Min(E): LLM Unlearning Approaches Under Function Minimizing Evaluation Constraints
Karla Salas-Jimenez | Francisco López-Ponce | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes Gradient Ascent and Task Vectors as LLM unlearning methodologies applied to SemEval 2025’s task 4. This task focuses on LLM unlearning on specific information under the constraints of preserving the model’s advanced text generation capabilities; meaning that our implementations of these algorithms were constrained both in the information datasets as well as the overall effect of each algorithm in the model’s general performance. Our implementation produced modified language models that ranked 7th out of 14 valid participants in the 7B parameter model, and 6th out of 24 in the 1B parameter model.

pdf bib abs

Into The Limits of Logic: Alignment Methods for Formal Logical Reasoning
Francisco F. López-Ponce | Gemma Bel-Enguix
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)

We implement Large Language Model Alignment algorithms to formal logic reasoning tasks involving natural-language (NL) to first-order logic (FOL) translation, formal logic inference, and premise retranslation. These methodologies were implemented using task-specific preference datasets created based on the FOLIO datasets and LLM generations. Alignment was based on DPO, this algorithm was implemented and tested on off-the-shelf and pre-aligned models, showing promising results for higher quality NL-FOL parsing, as well as general alignment strategies. In addition, we introduce a new similarity metric (LogicSim) between LLM-generated responses and gold standard values, that measures logic-relevant information such as premise count and overlap between answers and expands evaluation of NL-FOL translation pipelines. Our results show that LLMs still struggle with logical inference, however alignment benefits semantic parsing and retranslation of results from formal logic to natural language.

2024

pdf bib abs

WikiBias as an Extrapolation Corpus for Bias Detection
Karla Salas-Jimenez | Francisco López-Ponce | Sergio-Luis Ojeda-Trueba | Gemma Bel-Enguix
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia

This paper explores whether it is possible to train a machine learning model using Wikipedia data to detect subjectivity in sentences and generalize effectively to other domains. To achieve this, we performed experiments with the WikiBias corpus, the BABE corpus, and the CheckThat! Dataset. Various classical models for ML were tested, including Logistic Regression, SVC, and SVR, including characteristics such as Sentence Transformers similarity, probabilistic sentiment measures, and biased lexicons. Pre-trained models like DistilRoBERTa, as well as large language models like Gemma and GPT-4, were also tested for the same classification task.

pdf bib abs

GIL-IIMAS UNAM at SemEval-2024 Task 1: SAND: An In Depth Analysis of Semantic Relatedness Using Regression and Similarity Characteristics
F. López-Ponce | Ángel Cadena | K. Salas-Jimenez | D. Preciado Márquez | G. Bel-Enguix
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The STR shared task aims at detecting the degree of semantic relatedness between sentence pairs in multiple languages. Semantic relatedness relies on elements such as topic similarity, point of view agreement, entailment, and even human intuition, making it a broader field than sentence similarity. The GIL-IIMAS UNAM team proposes a model based in the SAND characteristics composition (Sentence Transformers, AnglE Embeddings, N-grams, Sentence Length Difference coefficient) and classical regression algorithms. This model achieves a 0.83 Spearman Correlation score in the English test, and a 0.73 in the Spanish counterpart, finishing just above the SemEval baseline in English, and second place in Spanish.