Dit-Yan Yeung


2024

pdf bib
Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability
Tsz Ting Chung | Leyang Cui | Lemao Liu | Xinting Huang | Shuming Shi | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: EMNLP 2024

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of natural language processing tasks when leveraging in-context learning. To mitigate the additional computational and financial costs associated with in-context learning, several prompt compression methods have been proposed to compress the in-context learning prompts. Despite their success, these methods face challenges with transferability due to model-specific compression, or rely on external training data, such as GPT-4. In this paper, we investigate the ability of LLMs to develop a unified compression method that discretizes uninformative tokens, utilizing a self-supervised pre-training technique. By introducing a small number of parameters during the continual pre-training, the proposed Selection-p produces a probability for each input token, indicating whether to preserve or discard it. Experiments show Selection-p achieves state-of-the-art performance across numerous classification tasks, achieving compression rates of up to 10 times while experiencing only a marginal 0.8% decrease in performance. Moreover, it exhibits superior transferability to different models compared to prior work. Additionally, we further analyze how Selection-p helps maintain performance on in-context learning with long contexts.

2023

pdf bib
SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme
Yusen Sun | Liangyou Li | Qun Liu | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: ACL 2023

Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality.

pdf bib
Towards Reference-free Text Simplification Evaluation with a BERT Siamese Network Architecture
Xinran Zhao | Esin Durmus | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: ACL 2023

Text simplification (TS) aims to modify sentences to make their both content and structure easier to understand. Traditional n-gram matching-based TS evaluation metrics heavily rely on the exact token match and human-annotated simplified sentences. In this paper, we present a novel neural-network-based reference-free TS metric BETS that leverages pre-trained contextualized language representation models and large-scale paraphrasing datasets to evaluate simplicity and meaning preservation. We show that our metric, without collecting any costly human simplification reference, correlates better than existing metrics with human judgments for the quality of both overall simplification (+7.7%) and its key aspects, i.e., comparative simplicity (+11.2%) and meaning preservation (+9.2%).

pdf bib
Towards General Error Diagnosis via Behavioral Testing in Machine Translation
Junjie Wu | Lemao Liu | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: EMNLP 2023

Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.

2022

pdf bib
Controlled Text Generation Using Dictionary Prior in Variational Autoencoders
Xianghong Fang | Jian Li | Lifeng Shang | Xin Jiang | Qun Liu | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: ACL 2022

While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability. The former results from the posterior collapse and restrictive assumption, which impede better representation learning. The latter arises as continuous latent variables in traditional formulations hinder VAEs from interpretability and controllability. In this paper, we propose Dictionary Prior (DPrior), a new data-driven prior that enjoys the merits of expressivity and controllability. To facilitate controlled text generation with DPrior, we propose to employ contrastive learning to separate the latent space into several parts. Extensive experiments on both language modeling and controlled text generation demonstrate the effectiveness of the proposed approach.

2021

pdf bib
Probing Toxic Content in Large Pre-Trained Language Models
Nedjma Ousidhoum | Xinran Zhao | Tianqing Fang | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Large pre-trained language models (PTLMs) have been shown to carry biases towards different social groups which leads to the reproduction of stereotypical and toxic content by major NLP systems. We propose a method based on logistic regression classifiers to probe English, French, and Arabic PTLMs and quantify the potentially harmful content that they convey with respect to a set of templates. The templates are prompted by a name of a social group followed by a cause-effect relation. We use PTLMs to predict masked tokens at the end of a sentence in order to examine how likely they enable toxicity towards specific communities. We shed the light on how such negative content can be triggered within unrelated and benign contexts based on evidence from a large-scale study, then we explain how to take advantage of our methodology to assess and mitigate the toxicity transmitted by PTLMs.

2020

pdf bib
Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets
Nedjma Ousidhoum | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Work on bias in hate speech typically aims to improve classification performance while relatively overlooking the quality of the data. We examine selection bias in hate speech in a language and label independent fashion. We first use topic models to discover latent semantics in eleven hate speech corpora, then, we present two bias evaluation metrics based on the semantic similarity between topics and search words frequently used to build corpora. We discuss the possibility of revising the data collection process by comparing datasets and analyzing contrastive case studies.

2019

pdf bib
Multilingual and Multi-Aspect Hate Speech Analysis
Nedjma Ousidhoum | Zizheng Lin | Hongming Zhang | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current research on hate speech analysis is typically oriented towards monolingual and single classification tasks. In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches. We evaluate our dataset in various classification settings, then we discuss how to leverage our annotations in order to improve hate speech detection and classification in general.