Songbo Hu


pdf bib
Can Pretrained Language Models (Yet) Reason Deductively?
Zhangdie Yuan | Songbo Hu | Ivan Vulić | Anna Korhonen | Zaiqiao Meng
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Acquiring factual knowledge with Pretrained Language Models (PLMs) has attracted increasing attention, showing promising performance in many knowledge-intensive tasks. Their good performance has led the community to believe that the models do possess a modicum of reasoning competence rather than merely memorising the knowledge. In this paper, we conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Through a series of controlled experiments, we posit two main findings. 1) PLMs inadequately generalise learned logic rules and perform inconsistently against simple adversarial surface form edits. 2) While the deductive reasoning fine-tuning of PLMs does improve their performance on reasoning over unseen knowledge facts, it results in catastrophically forgetting the previously learnt knowledge. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning, demonstrating the importance of controlled examinations and probing of PLMs’ deductive reasoning abilities; we reach beyond (misleading) task performance, revealing that PLMs are still far from robust reasoning capabilities, even for simple deductive tasks.

pdf bib
Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Songbo Hu | Han Zhou | Mete Hergul | Milan Gritta | Guchun Zhang | Ignacio Iacobacci | Ivan Vulić | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 11

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom–up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.

pdf bib
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems
Songbo Hu | Han Zhou | Moy Yuan | Milan Gritta | Guchun Zhang | Ignacio Iacobacci | Anna Korhonen | Ivan Vulić
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Achieving robust language technologies that can perform well across the world’s many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully parallel to English ToD data still exhibit diminished ToD task performance. Beyond providing a series of insights into the performance disparities of ToD systems in different languages, our analyses offer practical tips on how to approach ToD data collection and system development for new languages.


pdf bib
Domain-independent User Simulation with Transformers for Task-oriented Dialogue Systems
Hsien-chin Lin | Nurul Lubis | Songbo Hu | Carel van Niekerk | Christian Geishauser | Michael Heck | Shutong Feng | Milica Gasic
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Dialogue policy optimisation via reinforcement learning requires a large number of training interactions, which makes learning with real users time consuming and expensive. Many set-ups therefore rely on a user simulator instead of humans. These user simulators have their own problems. While hand-coded, rule-based user simulators have been shown to be sufficient in small, simple domains, for complex domains the number of rules quickly becomes intractable. State-of-the-art data-driven user simulators, on the other hand, are still domain-dependent. This means that adaptation to each new domain requires redesigning and retraining. In this work, we propose a domain-independent transformer-based user simulator (TUS). The structure of TUS is not tied to a specific domain, enabling domain generalization and the learning of cross-domain user behaviour from data. We compare TUS with the state-of-the-art using automatic as well as human evaluations. TUS can compete with rule-based user simulators on pre-defined domains and is able to generalize to unseen domains in a zero-shot fashion.