2025
pdf
bib
abs
Beyond BLEU: Ethical Risks of Misleading Evaluation in Domain-Specific QA with LLMs
Ayoub Nainia
|
Régine Vignes-Lebbe
|
Hajar Mousannif
|
Jihad Zahir
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models
Large Language Models (LLMs) are increasingly used in scientific question answering (QA), including high-stakes fields such as biodiversity informatics. However, standard evaluation metrics such as BLEU, ROUGE, Exact Match (EM), and BERTScore remain poorly aligned with the factual and domain-specific requirements of these tasks. In this work, we investigate the gap between automatic metrics and expert judgment in botanical QA by comparing metric scores with human ratings across five dimensions: accuracy, completeness, relevance, fluency, and terminology usage. Our results show that standard metrics often misrepresent response quality, particularly in the presence of paraphrasing, omission, or domain-specific language. Through both quantitative analysis and qualitative examples, we show that high-scoring responses may still exhibit critical factual errors or omissions. These findings highlight the need for domain-aware evaluation frameworks that incorporate expert feedback and raise important ethical concerns about the deployment of LLMs in scientific contexts.
pdf
bib
abs
F-LoRA-QA: Finetuning LLaMA Models with Low-Rank Adaptation for French Botanical Question Generation and Answering
Ayoub Nainia
|
Régine Vignes-Lebbe
|
Hajar Mousannif
|
Jihad Zahir
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Despite recent advances in large language models (LLMs), most question-answering (QA) systems remain English-centric and poorly suited to domain-specific scientific texts. This linguistic and domain bias poses a major challenge in botany, where a substantial portion of knowledge is documented in French. We introduce F-LoRA-QA, a fine-tuned LLaMA-based pipeline for French botanical QA, leveraging Low-Rank Adaptation (LoRA) for efficient domain adaptation. We construct a specialized dataset of 16,962 question-answer pairs extracted from scientific flora descriptions and fine-tune LLaMA models to retrieve structured knowledge from unstructured botanical texts. Expert-based evaluation confirms the linguistic quality and domain relevance of generated answers. Compared to baseline LLaMA models, F-LoRA-QA achieves a 300% BLEU score increase, 70% ROUGE-1 F1 gain, +16.8% BERTScore F1, and Exact Match improvement from 2.01% to 23.57%. These results demonstrate the effectiveness of adapting LLMs to low-resource scientific domains and highlight the potential of our approach for automated trait extraction and biodiversity data structuring.
2023
pdf
bib
abs
Extraction d’entités nommées à partir de descriptions d’espèces
Maya Sahraoui
|
Vincent Guigue
|
Régine Vignes-Lebbe
|
Marc Pignal
Actes de CORIA-TALN 2023. Actes de la 18e Conférence en Recherche d'Information et Applications (CORIA)
Les descriptions d’espèces contiennent des informations importantes sur les caractéristiques morphologiques des espèces, mais l’extraction de connaissances structurées à partir de ces descriptions est souvent chronophage. Nous proposons un modèle texte-graphe adapté aux descriptions d’espèces en utilisant la reconnaissance d’entités nommées (NER) faiblement supervisée. Après avoir extrait les entités nommées, nous reconstruisons les triplets en utilisant des règles de dépendance pour créer le graphe. Notre méthode permet de comparer différentes espèces sur la base de caractères morphologiques et de relier différentes sources de données. Les résultats de notre étude se concentrent sur notre modèle NER et démontrent qu’il est plus performant que les modèles de référence et qu’il constitue un outil précieux pour la communauté de l’écologie et de la biodiversité.