Tongtong Wu

2025

Continual Learning of Large Language Models
Tongtong Wu | Trang Vu | Linhao Luo | Gholamreza Haffari
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

As large language models (LLMs) continue to expand in size and utility, keeping them current with evolving knowledge and shifting user preferences becomes an increasingly urgent yet challenging task. This tutorial offers a comprehensive exploration of continual learning (CL) in the context of LLMs, presenting a structured framework that spans continual pre-training, instruction tuning, and alignment. Grounded in recent survey work and empirical studies, we discuss emerging trends, key methods, and practical insights from both academic research and industry deployments. In addition, we highlight the new frontier of lifelong LLM agents, i.e., systems capable of autonomous, self-reflective, and tool-augmented adaptation. Participants will gain a deep understanding of the computational, algorithmic, and ethical challenges inherent to CL in LLMs, and learn about strategies to mitigate forgetting, manage data and evaluation pipelines, and design systems that can adapt responsibly and reliably over time. This tutorial will benefit researchers and practitioners interested in advancing the long-term effectiveness, adaptability, and safety of foundation models.

pdf bib abs

Large language models (LLMs) have become increasingly central to AI applications worldwide, necessitating robust multilingual safety alignment to ensure secure deployment across diverse linguistic contexts. Existing preference learning methods for safety alignment, such as RLHF and DPO, are primarily monolingual and struggle with noisy multilingual data. To address these limitations, we introduce Multilingual reward gaP Optimization (MPO), a novel approach that leverages the well-aligned safety capabilities of the dominant language (e.g., English) to improve safety alignment across multiple languages. MPO directly minimizes the reward gap difference between the dominant language and target languages, effectively transferring safety capabilities while preserving the original strengths of the dominant language. Extensive experiments on three LLMs, LLaMA-3.1, Gemma-2 and Qwen2.5, validate MPO’s efficacy in multilingual safety alignment without degrading general multilingual utility.

pdf bib abs

Attributed Question Answering (AQA) has attracted wide attention, but there are still several limitations in evaluating the attributions, including lacking fine-grained attribution categories, relying on manual annotations, and failing to compare attributions with only subtle differences. To bridge these gaps, we introduce Complex Attributed Question Answering (CAQA), a large-scale benchmark containing comprehensive attribution categories, automatically generated using Knowledge Graphs (KGs), and complex attribution scenarios. We have conducted extensive experiments to verify the effectiveness of CAQA, including the benchmarking of 25 automatic evaluators, their comparison with human evaluators, the testing of LLM evaluators fine-tuned by CAQA and so on. These experiments also lead to a series of important findings that can benefit the future research of AQA.

2023

pdf bib abs

NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery
Farhad Moghimifar | Shilin Qu | Tongtong Wu | Yuan-Fang Li | Gholamreza Haffari
Findings of the Association for Computational Linguistics: ACL 2023

Norms, which are culturally accepted guidelines for behaviours, can be integrated into conversational models to generate utterances that are appropriate for the socio-cultural context. Existing methods for norm recognition tend to focus only on surface-level features of dialogues and do not take into account the interactions within a conversation. To address this issue, we propose NormMark, a probabilistic generative Markov model to carry the latent features throughout a dialogue. These features are captured by discrete and continuous latent variables conditioned on the conversation history, and improve the model’s ability in norm recognition. The model is trainable on weakly annotated data using the variational technique. On a dataset with limited norm annotations, we show that our approach achieves higher F1 score, outperforming current state-of-the-art methods, including GPT3.

2022

pdf bib abs

Event Causality Identification via Derivative Prompt Joint Learning
Shirong Shen | Heng Zhou | Tongtong Wu | Guilin Qi
Proceedings of the 29th International Conference on Computational Linguistics

This paper studies event causality identification, which aims at predicting the causality relation for a pair of events in a sentence. Regarding event causality identification as a supervised classification task, most existing methods suffer from the problem of insufficient annotated data. In this paper, we propose a new derivative prompt joint learning model for event causality identification, which leverages potential causal knowledge in the pre-trained language model to tackle the data scarcity problem. Specifically, rather than external data or knowledge augmentation, we derive two relevant prompt tasks from event causality identification to enhance the model’s ability to identify explicit and implicit causality. We evaluate our model on two benchmark datasets and the results show that our model has great advantages over previous methods.

pdf bib abs

Variational Autoencoder with Disentanglement Priors for Low-Resource Task-Specific Natural Language Generation
Zhuang Li | Lizhen Qu | Qiongkai Xu | Tongtong Wu | Tianyang Zhan | Gholamreza Haffari
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a variational autoencoder with disentanglement priors, VAE-Dprior, for task-specific natural language generation with none or a handful of task-specific labeled examples. In order to tackle compositional generalization across tasks, our model performs disentangled representation learning by introducing a conditional prior for the latent content space and another conditional prior for the latent label space. Both types of priors satisfy a novel property called 𝜖-disentangled. We show both empirically and theoretically that the novel priors can disentangle representations even without specific regularizations as in the prior work. The content prior enables directly sampling diverse content representations from the content space learned from the seen tasks, and fuse them with the representations of novel tasks for generating semantically diverse texts in the low-resource settings. Our extensive experiments demonstrate the superior performance of our model over competitive baselines in terms of i) data augmentation in continuous zero/few-shot learning, and ii) text style transfer in the few-shot setting.

pdf bib

TCG-Event: Effective Task Conditioning for Generation-based Event Extraction
Fatemeh Shiri | Tongtong Wu | Yuanfang Li | Gholamreza Haffari
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

pdf bib abs

Relation extraction typically aims to extract semantic relationships between entities from the unstructured text.One of the most essential data sources for relation extraction is the spoken language, such as interviews and dialogues.However, the error propagation introduced in automatic speech recognition (ASR) has been ignored in relation extraction, and the end-to-end speech-based relation extraction method has been rarely explored.In this paper, we propose a new listening information extraction task, i.e., speech relation extraction.We construct the training dataset for speech relation extraction via text-to-speech systems, and we construct the testing dataset via crowd-sourcing with native English speakers.We explore speech relation extraction via two approaches: the pipeline approach conducting text-based extraction with a pretrained ASR module, and the end2end approach via a new proposed encoder-decoder model, or what we called SpeechRE.We conduct comprehensive experiments to distinguish the challenges in speech relation extraction, which may shed light on future explorations. We share the code and data on https://github.com/wutong8023/SpeechRE.

2021

pdf bib

2020

pdf bib abs

Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning
Yuncheng Hua | Yuan-Fang Li | Gholamreza Haffari | Guilin Qi | Tongtong Wu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB). However, the conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types, harboring inherently different characteristics, e.g., difficulty level. This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions. Our method quickly and effectively adapts the meta-learned programmer to new questions based on the most similar questions retrieved from the training data. The meta-learned policy is then used to learn a good programming policy, utilizing the trial trajectories and their rewards for similar questions in the support set. Our method achieves state-of-the-art performance on the CQA dataset (Saha et al., 2018) while using only five trial trajectories for the top-5 retrieved questions in each support set, and meta-training on tasks constructed from only 1% of the training set. We have released our code at https://github.com/DevinJake/MRL-CQA.