2020
pdf
bib
abs
Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque
Maddalen López de Lacalle
|
Xabier Saralegi
|
Iñaki San Vicente
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper presents an approach for developing a task-oriented dialog system for less-resourced languages in scenarios where training data is not available. Both intent classification and slot filling are tackled. We project the existing annotations in rich-resource languages by means of Neural Machine Translation (NMT) and posterior word alignments. We then compare training on the projected monolingual data with direct model transfer alternatives. Intent Classifiers and slot filling sequence taggers are implemented using a BiLSTM architecture or by fine-tuning BERT transformer models. Models learnt exclusively from Basque projected data provide better accuracies for slot filling. Combining Basque projected train data with rich-resource languages data outperforms consistently models trained solely on projected data for intent classification. At any rate, we achieve competitive performance in both tasks, with accuracies of 81% for intent classification and 77% for slot filling.
2016
pdf
bib
abs
A Multilingual Predicate Matrix
Maddalen Lopez de Lacalle
|
Egoitz Laparra
|
Itziar Aldabe
|
German Rigau
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents the Predicate Matrix 1.3, a lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. This new version of the Predicate Matrix has been extended to cover nominal predicates by adding mappings to NomBank. Similarly, we have integrated resources in Spanish, Catalan and Basque. As a result, the Predicate Matrix 1.3 provides a multilingual lexicon to allow interoperable semantic analysis in multiple languages.
2014
pdf
bib
abs
Predicate Matrix: extending SemLink through WordNet mappings
Maddalen Lopez de Lacalle
|
Egoitz Laparra
|
German Rigau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents the Predicate Matrix v1.1, a new lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. We start from the basis of SemLink. Then, we use advanced graph-based algorithms to further extend the mapping coverage of SemLink. Second, we also exploit the current content of SemLink to infer new role mappings among the different predicate schemas. As a result, we have obtained a new version of the Predicate Matrix which largely extends the current coverage of SemLink and the previous version of the Predicate Matrix.
pdf
bib
First steps towards a Predicate Matrix
Maddalen López de Lacalle
|
Egoitz Laparra
|
German Rigau
Proceedings of the Seventh Global Wordnet Conference
2010
pdf
bib
abs
Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR
Xabier Saralegi
|
Maddalen Lopez de Lacalle
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper deals with the main problems that arise in the query translation process in dictionary-based Cross-lingual Information Retrieval (CLIR): translation selection, presence of Out-Of-Vocabulary (OOV) terms and translation of Multi-Word Expressions (MWE). We analyse to what extent each problem affects the retrieval performance for the Basque-English pair of languages, and the improvement obtained when using parallel corpora free methods to address them. To tackle the translation selection problem we provide novel extensions of an already existing monolingual target co-occurrence-based method, the Out-Of Vocabulary terms are dealt with by means of a cognate detection-based method and finally, for the Multi-Word Expression translation problem, a naïve matching technique is applied. The error analysis shows significant differences in the deterioration of the performance depending on the problem, in terms of Mean Average Precision (MAP), the translation selection problem being the cause of most of the errors. Otherwise, the proposed combined strategy shows a good performance to tackle the three above-mentioned main problems.