2024
pdf
bib
abs
Evaluation of Question Answer Generation for Portuguese: Insights and Datasets
Felipe Paula
|
Cassiana Roberta Lizzoni Michelin
|
Viviane Moreira
Findings of the Association for Computational Linguistics: EMNLP 2024
Automatic question generation is an increasingly important task that can be applied in different settings, including educational purposes, data augmentation for question-answering (QA), and conversational systems. More specifically, we focus on question answer generation (QAG), which produces question-answer pairs given an input context. We adapt and apply QAG approaches to generate question-answer pairs for different domains and assess their capacity to generate accurate, diverse, and abundant question-answer pairs. Our analyses combine both qualitative and quantitative evaluations that allow insights into the quality and types of errors made by QAG methods. We also look into strategies for error filtering and their effects. Our work concentrates on Portuguese, a widely spoken language that is underrepresented in natural language processing research. To address the pressing need for resources, we generate and make available human-curated extractive QA datasets in three diverse domains.
2018
pdf
bib
abs
Similarity Measures for the Detection of Clinical Conditions with Verbal Fluency Tasks
Felipe Paula
|
Rodrigo Wilkens
|
Marco Idiart
|
Aline Villavicencio
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Semantic Verbal Fluency tests have been used in the detection of certain clinical conditions, like Dementia. In particular, given a sequence of semantically related words, a large number of switches from one semantic class to another has been linked to clinical conditions. In this work, we investigate three similarity measures for automatically identifying switches in semantic chains: semantic similarity from a manually constructed resource, and word association strength and semantic relatedness, both calculated from corpora. This information is used for building classifiers to distinguish healthy controls from clinical cases with early stages of Alzheimer’s Disease and Mild Cognitive Deficits. The overall results indicate that for clinical conditions the classifiers that use these similarity measures outperform those that use a gold standard taxonomy.
2017
pdf
bib
LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds
Rodrigo Wilkens
|
Leonardo Zilio
|
Silvio Ricardo Cordeiro
|
Felipe Paula
|
Carlos Ramisch
|
Marco Idiart
|
Aline Villavicencio
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers