Han Zhou


2023

pdf bib
Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Songbo Hu | Han Zhou | Mete Hergul | Milan Gritta | Guchun Zhang | Ignacio Iacobacci | Ivan Vulić | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 11

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom–up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.

pdf bib
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems
Songbo Hu | Han Zhou | Moy Yuan | Milan Gritta | Guchun Zhang | Ignacio Iacobacci | Anna Korhonen | Ivan Vulić
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Achieving robust language technologies that can perform well across the world’s many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully parallel to English ToD data still exhibit diminished ToD task performance. Beyond providing a series of insights into the performance disparities of ToD systems in different languages, our analyses offer practical tips on how to approach ToD data collection and system development for new languages.

pdf bib
XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking
Han Zhou | Ignacio Iacobacci | Pasquale Minervini
Findings of the Association for Computational Linguistics: EACL 2023

Dialogue State Tracking (DST), a crucial component of task-oriented dialogue (ToD) systems, keeps track of all important information pertaining to dialogue history: filling slots with the most probable values throughout the conversation. Existing methods generally rely on a predefined set of values and struggle to generalise to previously unseen slots in new domains. To overcome these challenges, we propose a domain-agnostic extractive question answering (QA) approach with shared weights across domains. To disentangle the complex domain information in ToDs, we train our DST with a novel domain filtering strategy by excluding out-of-domain question samples. With an independent classifier that predicts the presence of multiple domains given the context, our model tackles DST by extracting spans in active domains. Empirical results demonstrate that our model can efficiently leverage domain-agnostic QA datasets by two-stage fine-tuning while being both domain-scalable and open vocabulary in DST. It shows strong transferability by achieving zero-shot domain-adaptation results on MultiWOZ 2.1 with an average JGA of 36.7%. It further achieves cross-lingual transfer with state-of-the-art zero-shot results, 66.2% JGA from English to German and 75.7% JGA from English to Italian on WOZ 2.0.

pdf bib
Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning
Han Zhou | Xingchen Wan | Ivan Vulić | Anna Korhonen
Findings of the Association for Computational Linguistics: EMNLP 2023

Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.

2022

pdf bib
PingAnTech at SMM4H task1: Multiple pre-trained model approaches for Adverse Drug Reactions
Xi Liu | Han Zhou | Chang Su
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes the solution for the Social Media Mining for Health (SMM4H) 2022 Shared Task. We participated in Task1a., Task1b. and Task1c. To solve the problem of the presence of Twitter data, we used a pre-trained language model. We used training strategies that involved: adversarial training, head layer weighted fusion, etc., to improve the performance of the model. The experimental results show the effectiveness of our designed system. For task 1a, the system achieved an F1 score of 0.68; for task 1b Overlapping F1 score of 0.65 and a Strict F1 score of 0.49. Task 1c yields Overlapping F1 and Strict F1 scores of 0.36 and 0.30, respectively.

2021

pdf bib
Incorporating Global Information in Local Attention for Knowledge Representation Learning
Yu Zhao | Han Zhou | Ruobing Xie | Fuzhen Zhuang | Qing Li | Ji Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021