Yi Li


2024

pdf bib
How Grammatical Features Impact Machine Translation: A New Test Suite for Chinese-English MT Evaluation
Huacheng Song | Yi Li | Yiwen Wu | Yu Liu | Jingxia Lin | Hongzhi Xu
Proceedings of the Ninth Conference on Machine Translation

Machine translation (MT) evaluation has evolved toward a trend of fine-grained granularity, enabling a more precise diagnosis of hidden flaws and weaknesses of MT systems from various perspectives. This paper examines how MT systems are potentially affected by certain grammatical features, offering insights into the challenges these features pose and suggesting possible directions for improvement. We develop a new test suite by extracting 7,848 sentences from a multi-domain Chinese-English parallel corpus. All the Chinese text was further annotated with 43 grammatical features using a semi-automatic method. This test suite was subsequently used to evaluate eight state-of-the-art MT systems according to six different automatic evaluation metrics. The results reveal intriguing patterns of MT performance associated with different domains and various grammatical features, highlighting the test suite’s effectiveness. The test suite was made publicly available and it will serve as an important benchmark for evaluating and diagnosing Chinese-English MT systems.

pdf bib
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Guisheng Liu | Yi Li | Zhengcong Fei | Haiyan Fu | Xiangyang Luo | Yanqing Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.

2022

pdf bib
CLGC: A Corpus for Chinese Literary Grace Evaluation
Yi Li | Dong Yu | Pengyuan Liu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we construct a Chinese literary grace corpus, CLGC, with 10,000 texts and more than 1.85 million tokens. Multi-level annotations are provided for each text in our corpus, including literary grace level, sentence category, and figure-of-speech type. Based on the corpus, we dig deep into the correlation between fine-grained features (semantic information, part-of-speech and figure-of-speech, etc.) and literary grace level. We also propose a new Literary Grace Evaluation (LGE) task, which aims at making a comprehensive assessment of the literary grace level according to the text. In the end, we build some classification models with machine learning algorithms (such as SVM, TextCNN) to prove the effectiveness of our features and corpus for LGE. The results of our preliminary classification experiments have achieved 79.71% on the weighted average F1-score.

pdf bib
Multi-Attribute Controlled Text Generation with Contrastive-Generator and External-Discriminator
Guisheng Liu | Yi Li | Yanqing Guo | Xiangyang Luo | Bo Wang
Proceedings of the 29th International Conference on Computational Linguistics

Though existing researches have achieved impressive results in controlled text generation, they focus mainly on single-attribute control. However, in applications like automatic comments, the topic and sentiment need to be controlled simultaneously. In this work, we propose a new framework for multi-attribute controlled text generation. To achieve this, we design a contrastive-generator that can effectively generate texts with more attributes. In order to increase the convergence of the text on the desired attributes, we adopt an external-discriminator to distinguish whether the generated text holds the desired attributes. Moreover, we propose top-n weighted decoding to further improve the relevance of texts to attributes. Automated evaluations and human evaluations show that our framework achieves remarkable controllability in multi-attribute generation while keeping the text fluent and diverse. It also yields promising performance on zero-shot generation.

2018

pdf bib
Large Margin Neural Language Model
Jiaji Huang | Yi Li | Wei Ping | Liang Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a large margin criterion for training neural language models. Conventionally, neural language models are trained by minimizing perplexity (PPL) on grammatical sentences. However, we demonstrate that PPL may not be the best metric to optimize in some tasks, and further propose a large margin formulation. The proposed method aims to enlarge the margin between the “good” and “bad” sentences in a task-specific sense. It is trained end-to-end and can be widely applied to tasks that involve re-scoring of generated text. Compared with minimum-PPL training, our method gains up to 1.1 WER reduction for speech recognition and 1.0 BLEU increase for machine translation.

2016

pdf bib
A Preliminary Study of Disputation Behavior in Online Debating Forum
Zhongyu Wei | Yandi Xia | Chen Li | Yang Liu | Zachary Stallbohm | Yi Li | Yang Jin
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Is This Post Persuasive? Ranking Argumentative Comments in Online Forum
Zhongyu Wei | Yang Liu | Yi Li
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Deploying MT into a Localisation Workflow: Pains and Gains
Yanli Sun | Juan Liu | Yi Li
Proceedings of Machine Translation Summit XIII: Papers

2007

pdf bib
Exploring Abbreviation Expansion for Genomic Information Retrieval
Nicola Stokes | Yi Li | Lawrence Cavedon | Justin Zobel
Proceedings of the Australasian Language Technology Workshop 2007