中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction)

Wang Xiaoying (王晓盈), Mu Lingling (穆玲玲), Xu Hongfei (许鸿飞)


Abstract
“在语法纠错(Grammatical Error Correction,GEC)任务上,序列到序列(Sequence-to sequence,seq2seq)模型与序列到编辑(Sequence-to-edit,seq2edit)模型相比可以取得相当或更好的性能。序列到编辑模型通常通过多次迭代解码,而序列到序列模型则以从左到右的方式一次性解码,不考虑后续的词语。通过在序列到序列模型中应用多轮解码(Multi-Turn Decoding,MTD)来迭代改进前一轮的修正结果,可能会进一步提升性能。然而,多轮解码会增加推理的计算成本,且前一轮修正中的删除或替换操作可能会导致原始输入中有用的源语句信息丢失。本文提出了一种早停机制来提高效率。同时,为解决源语句信息丢失问题,本文将原始输入与上一轮的修正结果合并为一个序列。在NLPCC2018测试集、FCGEC验证集和NaCGEC测试集的实验结果表明,本文方法可在BART基线上能带来一致且显著的性能提升,F0.5值分别提高了+2.06,+2.31和+3.45,分别取得了47.34,54.58和62.09的F0.5值。”
Anthology ID:
2024.ccl-1.53
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
673–687
Language:
Chinese
URL:
https://aclanthology.org/2024.ccl-1.53/
DOI:
Bibkey:
Cite (ACL):
Wang Xiaoying, Mu Lingling, and Xu Hongfei. 2024. 中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 673–687, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction) (Xiaoying et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.53.pdf