2024
pdf
bib
abs
Incremental Learning for Knowledge-Grounded Dialogue Systems in Industrial Scenarios
Izaskun Fernandez
|
Cristina Aceta
|
Cristina Fernandez
|
Maria Ines Torres
|
Aitor Etxalar
|
Ariane Mendez
|
Maia Agirre
|
Manuel Torralbo
|
Arantza Del Pozo
|
Joseba Agirre
|
Egoitz Artetxe
|
Iker Altuna
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
In today’s industrial landscape, seamless collaboration between humans and machines is essential and requires a shared knowledge of the operational domain. In this framework, the technical knowledge for operator assistance has traditionally been derived from static sources such as technical documents. However, experienced operators hold invaluable know-how that can significantly contribute to support other operators. This work focuses on enhancing the operator assistance tasks in the manufacturing industry by leveraging spoken natural language interaction. More specifically, a Human-in-the-Loop (HIL) incremental learning approach is proposed to integrate this expertise into a domain knowledge graph (KG) dynamically, along with the use of in-context learning for Large Language Models (LLMs) to benefit other capabilities of the system. Preliminary results of the experimentation carried out in an industrial scenario, where the graph size was increased in a 25%, demonstrate that the incremental enhancing of the KG benefits the dialogue system’s performance.
2022
pdf
bib
abs
Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions
Maia Aguirre
|
Manex Serras
|
Laura García-sardiña
|
Jacobo López-fernández
|
Ariane Méndez
|
Arantza Del Pozo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Code-switching (CS) is a very common phenomenon in regions with various co-existing languages. Since CS is such a frequent habit in informal communications, both spoken and written, it also arises naturally in Human-Machine Interactions. Therefore, in order for natural language understanding (NLU) not to be degraded, CS must be taken into account when developing chatbots. The co-existence of multiple languages in a single NLU model has become feasible with multilingual language representation models such as mBERT. In this paper, the efficacy of zero-shot cross-lingual transfer learning with mBERT for NLU is evaluated on a Basque-Spanish CS chatbot corpus, comparing the performance of NLU models trained using in-domain chatbot utterances in Basque and/or Spanish without CS. The results obtained indicate that training joint multi-intent classification and entity recognition models on both languages simultaneously achieves best performance, better capturing the CS patterns.
pdf
bib
abs
BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding
Maia Aguirre
|
Laura García-Sardiña
|
Manex Serras
|
Ariane Méndez
|
Jacobo López
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The main objective of this work is the elaboration and public release of BaSCo, the first corpus with annotated linguistic resources encompassing Basque-Spanish code-switching. The mixture of Basque and Spanish languages within the same utterance is popularly referred to as Euskañol, a widespread phenomenon among bilingual speakers in the Basque Country. Thus, this corpus has been created to meet the demand of annotated linguistic resources in Euskañol in research areas such as multilingual dialogue systems. The presented resource is the result of translating to Euskañol a compilation of texts in Basque and Spanish that were used for training the Natural Language Understanding (NLU) models of several task-oriented bilingual chatbots. Those chatbots were meant to answer specific questions associated with the administration, fiscal, and transport domains. In addition, they had the transverse potential to answer to greetings, requests for help, and chit-chat questions asked to chatbots. BaSCo is a compendium of 1377 tagged utterances with every sample annotated at three levels: (i) NLU semantic labels, considering intents and entities, (ii) code-switching proportion, and (iii) domain of origin.