Lingling Mu (穆玲玲) - ACL Anthology

Lingling Mu

Also published as: 玲玲穆

2025

Quality-aware Neural Machine Translation with Self-evaluation
Jiajia Cui | Lingling Mu | Qiuhui Liu | Hongfei Xu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"The performance of neural machine translation relies on a large amount of data, but crawled sentence pairs are of different quality. The low-quality sentence pairs may provide helpful translation knowledge but also teach the model to generate low-quality translations. Making the model aware of the quality of training instances may help the model distinguish between good and bad translations while leveraging the translation knowledge. In this paper, we evaluate the quality of training instances with the average per-token loss (negative log-likelihood) from translation mod-els, convert the quality scores into embeddings through vector interpolation and feed the quality embedding into the translation model during its training. We ask the model to decode with the best quality score to generate good translations during inference. Experiments on the IWSLT 14 German to English, WMT 14 English to German and WMT 22 English to Japanese translation tasks show that our method can effectively lead to consistent and significant improvements across multiple metrics."

pdf bib abs

Text-to-SQL, which maps natural language to SQL queries, has benefited greatly from recent advances in Large Language Models (LLMs). While LLMs offer various paradigms for this task, including prompting and supervised fine-tuning (SFT), SFT approaches still face challenges such as complex multi-stage pipelines and poor robustness to noisy schema information. To address these limitations, we present JOLT-SQL, a streamlined single-stage SFT framework that jointly optimizes schema linking and SQL generation via a unified loss. JOLT-SQL employs discriminative schema linking, enhanced by local bidirectional attention, alongside a confusion-aware noisy schema sampling strategy with selective attention to improve robustness under noisy schema conditions. Experiments on the Spider and BIRD benchmarks demonstrate that JOLT-SQL achieves state-of-the-art execution accuracy among comparable-size open-source models, while significantly improving both training and inference efficiency.

pdf bib abs

A Self-Distillation Recipe for Neural Machine Translation
Hongfei Xu | Zhuofei Liang | Qiuhui Liu | Lingling Mu
Findings of the Association for Computational Linguistics: ACL 2025

Self-distillation distills the deeper sub-networks to the shallower sub-networks without using an extra teacher model, and has been proven effective in improving the performance of a series of computer vision tasks. In this paper, we study the representation-based self-distillation methods for Neural Machine Translation (NMT) considering the efficiency issue with a large vocabulary. We present a rank-order augmented Pearson correlation loss and an iterative distillation method to prevent the discrepancy of predictions between the student and a stronger teacher from disturbing the training. To prevent the teacher from misleading the student’s learning, we utilize a warm-up strategy and present a gradient adaption method to scale down or zero the Knowledge Distillation (KD) gradients which are opposite to the translation. Experiments show that our method can lead to significant improvements over the strong Transformer baseline on low/middle/high-resource tasks, obtaining comparable performance to previous MT KD studies without pre-training a teacher. Deeper Transformer experiments show that our method can lead to comparable or better performance with fewer layers.

2024

pdf bib abs

大语言模型在中文文本纠错任务的评测(Evaluation of large language models for Chinese text error correction tasks)
Lingling Mu (穆玲玲) | Xiaoying Wang (王晓盈) | Jiajia Cui (崔佳佳)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“大语言模型(Large Language Models,LLMs)在信息抽取、机器翻译等自然语言处理任务上的能力已被广泛评估,但是在文本纠错方面还主要局限于评价GPT的英文语法纠错能力。中文文本纠错任务包括中文语法检测 (Chinese Grammatical Error Detection,CGED)和中文语法纠错(Chinese Error Correction,CGEC)两个子任务。本文使用提示的方法评估了国内外的主流大模型在中文语法检测和中文语法纠错任务上的能力。论文设计了不同的提示策略,对结果进行了整体和细粒度的分析。在NLPCC2018和CGED2018测试集上的实验结果表明,ERNIE-4和ChatGLM-4的中文文本纠错能力优于GPT-3.5-Turbo和LLaMa-2-7B-Chat,少样本思维链提示策略性能最优,对词序错误和拼写错误上纠正的准确率较高,说明大模型在低资源下具有较好的中文文本纠错能力。然而测试结果显示大模型的召回率比基线模型高至少14个百分点,说明大模型在中文文本纠错任务上存在过度校正的问题。”

pdf bib abs

中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction)
Xiaoying Wang (王晓盈) | Lingling Mu (穆玲玲) | Hongfei Xu (许鸿飞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“在语法纠错(Grammatical Error Correction,GEC)任务上,序列到序列(Sequence-to sequence,seq2seq)模型与序列到编辑(Sequence-to-edit,seq2edit)模型相比可以取得相当或更好的性能。序列到编辑模型通常通过多次迭代解码,而序列到序列模型则以从左到右的方式一次性解码,不考虑后续的词语。通过在序列到序列模型中应用多轮解码(Multi-Turn Decoding,MTD)来迭代改进前一轮的修正结果,可能会进一步提升性能。然而,多轮解码会增加推理的计算成本,且前一轮修正中的删除或替换操作可能会导致原始输入中有用的源语句信息丢失。本文提出了一种早停机制来提高效率。同时,为解决源语句信息丢失问题,本文将原始输入与上一轮的修正结果合并为一个序列。在NLPCC2018测试集、FCGEC验证集和NaCGEC测试集的实验结果表明,本文方法可在BART基线上能带来一致且显著的性能提升,F0.5值分别提高了+2.06,+2.31和+3.45,分别取得了47.34,54.58和62.09的F0.5值。”

pdf bib abs

基于动态提示学习和依存关系的生成式结构化情感分析模型(Dynamic Prompt Learning and Dependency Relation based Generative Structured Sentiment Analysis Model)
Yintao Jia (贾银涛) | Jiajia Cui (崔佳佳) | Lingling Mu (穆玲玲) | Hongying Zan (昝红英)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“结构化情感分析旨在从文本中抽取所有由情感持有者、目标事物、观点表示和情感极性构成的情感元组,是较为全面的细粒度情感分析任务。针对目前结构化情感分析方法错误传递,提示模版适应性不足和情感要素构成复杂的问题,本文提出了基于动态提示学习和依存关系的生成式结构化情感分析模型,根据不同的情感元组构成情况分别设计提示模版,并用模板增强生成式预训练模型的输入,用依存关系增强生成效果。实验结果显示,本文提出的模型在SemEval20221数据集上的SF1值优于所对比的基线模型。”

pdf bib abs

Multi-pass Decoding for Grammatical Error Correction
Xiaoying Wang | Lingling Mu | Jingyi Zhang | Hongfei Xu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Sequence-to-sequence (seq2seq) models achieve comparable or better grammatical error correction performance compared to sequence-to-edit (seq2edit) models. Seq2edit models normally iteratively refine the correction result, while seq2seq models decode only once without aware of subsequent tokens. Iteratively refining the correction results of seq2seq models via Multi-Pass Decoding (MPD) may lead to better performance. However, MPD increases the inference costs. Deleting or replacing corrections in previous rounds may lose useful information in the source input. We present an early-stop mechanism to alleviate the efficiency issue. To address the source information loss issue, we propose to merge the source input with the previous round correction result into one sequence. Experiments on the CoNLL-14 test set and BEA-19 test set show that our approach can lead to consistent and significant improvements over strong BART and T5 baselines (+1.80, +1.35, and +2.02 F0.5 for BART 12-2, large and T5 large respectively on CoNLL-14 and +2.99, +1.82, and +2.79 correspondingly on BEA-19), obtaining F0.5 scores of 68.41 and 75.36 on CoNLL-14 and BEA-19 respectively.

2022

pdf bib abs

ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation
Wenjie Hao | Hongfei Xu | Deyi Xiong | Hongying Zan | Lingling Mu
Proceedings of the 29th International Conference on Computational Linguistics

Paraphrasing, i.e., restating the same meaning in different ways, is an important data augmentation approach for natural language processing (NLP). Zhang et al. (2019b) propose to extract sentence-level paraphrases from multiple Chinese translations of the same source texts, and construct the PKU Paraphrase Bank of 0.5M sentence pairs. However, despite being the largest Chinese parabank to date, the size of PKU parabank is limited by the availability of one-to-many sentence translation data, and cannot well support the training of large Chinese paraphrasers. In this paper, we relieve the restriction with one-to-many sentence translation data, and construct ParaZh-22M, a larger Chinese parabank that is composed of 22M sentence pairs, based on one-to-one bilingual sentence translation data and machine translation (MT). In our data augmentation experiments, we show that paraphrasing based on ParaZh-22M can bring about consistent and significant improvements over several strong baselines on a wide range of Chinese NLP tasks, including a number of Chinese natural language understanding benchmarks (CLUE) and low-resource machine translation.

Co-authors

Venues

Fix author