Xuan Zhang


2024

pdf bib
Ask-before-Plan: Proactive Language Agents for Real-World Planning
Xuan Zhang | Yang Deng | Zifeng Ren | See-Kiong Ng | Tat-Seng Chua
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf bib
Best Practices of Successive Halving on Neural Machine Translation and Large Language Models
Xuan Zhang | Kevin Duh
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Hyperparameter optimization (HPO) enhances neural machine translation (NMT) models but demands substantial computational resources. Successive halving, a multi-fidelity HPO method, mitigates this by early stopping unpromising models and allocating more resources to promising ones. This method is particularly relevant for NMT and large language models, which are computationally intensive. However, successive halving relies on a noisy estimation of model performance and assumes that early performance is highly correlated with final performance. We introduce a table lookup benchmark dataset to study the reliability of successive halving and propose best practices for its application in NMT and large language models.

pdf bib
On the Multi-turn Instruction Following for Conversational Web Agents
Yang Deng | Xuan Zhang | Wenxuan Zhang | Yifei Yuan | See-Kiong Ng | Tat-Seng Chua
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.

pdf bib
Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM
Xuan Zhang | Wei Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-box LLM. We propose an approach leveraging Fine-grained Feedback with Reinforcement Retrieval (FFRR) to enhance fact-checking on news claims by using black-box LLM. FFRR adopts a two-level strategy to gather fine-grained feedback from the LLM, which serves as a reward for optimizing the retrieval policy, by rating the retrieved documents based on the non-retrieval ground truth of the task. We evaluate our model on two public datasets for real-world news claim verification, and the results demonstrate that FFRR achieves significant improvements over strong LLM-enabled and non-LLM baselines.

2023

pdf bib
A Hyperparameter Optimization Toolkit for Neural Machine Translation Research
Xuan Zhang | Kevin Duh | Paul McNamee
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Hyperparameter optimization is an important but often overlooked process in the research of deep learning technologies. To obtain a good model, one must carefully tune hyperparameters that determine the architecture and training algorithm. Insufficient tuning may result in poor results, while inequitable tuning may lead to exaggerated differences between models. We present a hyperparameter optimization toolkit for neural machine translation (NMT) to help researchers focus their time on the creative rather than the mundane. The toolkit is implemented as a wrapper on top of the open-source Sockeye NMT software. Using the Asynchronous Successive Halving Algorithm (ASHA), we demonstrate that it is possible to discover near-optimal models under a computational budget with little effort. Code: https://github.com/kevinduh/sockeye-recipes3Video demo: https://cs.jhu.edu/kevinduh/j/demo.mp4

pdf bib
Handshape-Aware Sign Language Recognition: Extended Datasets and Exploration of Handshape-Inclusive Methods
Xuan Zhang | Kevin Duh
Findings of the Association for Computational Linguistics: EMNLP 2023

The majority of existing work on sign language recognition encodes signed videos without explicitly acknowledging the phonological attributes of signs. Given that handshape is a vital parameter in sign languages, we explore the potential of handshape-aware sign language recognition. We augment the PHOENIX14T dataset with gloss-level handshape labels, resulting in the new PHOENIX14T-HS dataset. Two unique methods are proposed for handshape-inclusive sign language recognition: a single-encoder network and a dual-encoder network, complemented by a training strategy that simultaneously optimizes both the CTC loss and frame-level cross-entropy loss. The proposed methodology consistently outperforms the baseline performance. The dataset and code can be accessed at: www.anonymous.com.

pdf bib
AutoML for NLP
Kevin Duh | Xuan Zhang
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Automated Machine Learning (AutoML) is an emerging field that has potential to impact how we build models in NLP. As an umbrella term that includes topics like hyperparameter optimization and neural architecture search, AutoML has recently become mainstream at major conferences such as NeurIPS, ICML, and ICLR. What does this mean to NLP? Currently, models are often built in an ad hoc process: we might borrow default hyperparameters from previous work and try a few variant architectures, but it is never guaranteed that final trained model is optimal. Automation can introduce rigor in this model-building process. This tutorial will summarize the main AutoML techniques and illustrate how to apply them to improve the NLP model-building process.

pdf bib
Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompting Method
Xuan Zhang | Wei Gao
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA
Xuan Zhang | Navid Rajabi | Kevin Duh | Philipp Koehn
Proceedings of the Eighth Conference on Machine Translation

While large language models have made remarkable advancements in natural language generation, their potential in machine translation, especially when fine-tuned, remains under-explored. In our study, we conduct comprehensive experiments, evaluating 15 publicly available language models on machine translation tasks. We compare the performance across three methodologies: zero-shot prompting, few-shot learning, and fine-tuning. Central to our approach is the use of QLoRA, an efficient fine-tuning method. On French-English, QLoRA fine-tuning outperforms both few-shot learning and models trained from scratch. This superiority is highlighted in both sentence-level and document-level translations, with a significant BLEU score improvement of 28.93 over the prompting method. Impressively, with QLoRA, the enhanced performance is achieved by fine-tuning a mere 0.77% of the model’s parameters.

2022

pdf bib
Post-Hoc Interpretation of Transformer Hyperparameters with Explainable Boosting Machines
Kiron Deb | Xuan Zhang | Kevin Duh
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Hyperparameter tuning is important for achieving high accuracy in deep learning models, yet little interpretability work has focused on hyperparameters. We propose to use the Explainable Boosting Machine (EBM), a glassbox method, as a post-hoc analysis tool for understanding how hyperparameters influence model accuracy. We present a case study on Transformer models in machine translation to illustrate the kinds of insights that may be gleaned, and perform extensive analysis to test the robustness of EBM under different data conditions.

2021

pdf bib
Approaching Sign Language Gloss Translation as a Low-Resource Machine Translation Task
Xuan Zhang | Kevin Duh
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

A cascaded Sign Language Translation system first maps sign videos to gloss annotations and then translates glosses into a spoken languages. This work focuses on the second-stage gloss translation component, which is challenging due to the scarcity of publicly available parallel data. We approach gloss translation as a low-resource machine translation task and investigate two popular methods for improving translation quality: hyperparameter search and backtranslation. We discuss the potentials and pitfalls of these methods based on experiments on the RWTH-PHOENIX-Weather 2014T dataset.

2020

pdf bib
Machine Translation System Selection from Bandit Feedback
Jason Naradowsky | Xuan Zhang | Kevin Duh
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems
Xuan Zhang | Kevin Duh
Transactions of the Association for Computational Linguistics, Volume 8

Hyperparameter selection is a crucial part of building neural machine translation (NMT) systems across both academia and industry. Fine-grained adjustments to a model’s architecture or training recipe can mean the difference between a positive and negative research result or between a state-of-the-art and underperforming system. While recent literature has proposed methods for automatic hyperparameter optimization (HPO), there has been limited work on applying these methods to neural machine translation (NMT), due in part to the high costs associated with experiments that train large numbers of model variants. To facilitate research in this space, we introduce a lookup-based approach that uses a library of pre-trained models for fast, low cost HPO experimentation. Our contributions include (1) the release of a large collection of trained NMT models covering a wide range of hyperparameters, (2) the proposal of targeted metrics for evaluating HPO methods on NMT, and (3) a reproducible benchmark of several HPO methods against our model library, including novel graph-based and multiobjective methods.

pdf bib
SEMA: Text Simplification Evaluation through Semantic Alignment
Xuan Zhang | Huizhou Zhao | KeXin Zhang | Yiyang Zhang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

Text simplification is an important branch of natural language processing. At present, methods used to evaluate the semantic retention of text simplification are mostly based on string matching. We propose the SEMA (text Simplification Evaluation Measure through Semantic Alignment), which is based on semantic alignment. Semantic alignments include complete alignment, partial alignment and hyponymy alignment. Our experiments show that the evaluation results of SEMA have a high consistency with human evaluation for the simplified corpus of Chinese and English news texts.

2019

pdf bib
Curriculum Learning for Domain Adaptation in Neural Machine Translation
Xuan Zhang | Pamela Shapiro | Gaurav Kumar | Paul McNamee | Marine Carpuat | Kevin Duh
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.

pdf bib
HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
Brian Thompson | Rebecca Knowles | Xuan Zhang | Huda Khayrallah | Kevin Duh | Philipp Koehn
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

2018

pdf bib
The JHU/KyotoU Speech Translation System for IWSLT 2018
Hirofumi Inaguma | Xuan Zhang | Zhiqi Wang | Adithya Renduchintala | Shinji Watanabe | Kevin Duh
Proceedings of the 15th International Conference on Spoken Language Translation

This paper describes the Johns Hopkins University (JHU) and Kyoto University submissions to the Speech Translation evaluation campaign at IWSLT2018. Our end-to-end speech translation systems are based on ESPnet and implements an attention-based encoder-decoder model. As comparison, we also experiment with a pipeline system that uses independent neural network systems for both the speech transcription and text translation components. We find that a transfer learning approach that bootstraps the end-to-end speech translation system with speech transcription system’s parameters is important for training on small datasets.

2017

pdf bib
Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings
Deyu Zhou | Xuan Zhang | Yulan He
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co-occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don’t recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure.

2016

pdf bib
Emotion Distribution Learning from Texts
Deyu Zhou | Xuan Zhang | Yin Zhou | Quan Zhao | Xin Geng
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Virginia Tech System at CoNLL-2016 Shared Task on Shallow Discourse Parsing
Prashant Chandrasekar | Xuan Zhang | Saurabh Chakravarty | Arijit Ray | John Krulick | Alla Rozovskaya
Proceedings of the CoNLL-16 shared task