融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features)

Zheng Shen (沈政), Cunli Mao (毛存礼), Zhengtao Yu (余正涛), Shengxiang Gao (高盛祥), Linqin Wang (王琳钦), Yuxin Huang (黄于欣)


Abstract
“多语言神经机器翻译是提升低资源语言翻译质量的有效手段。由于不同语言之间字符差异较大,现有方法难以得到统一的词表征形式。泰语和老挝语属于具有音素相似性的低资源语言,考虑到利用语言相似性能够拉近语义距离,提出一种融入音素特征的多语言词表征学习方法:(1)设计音素特征表示模块和泰老文本表示模块,基于交叉注意力机制得到融合音素特征后的泰老文本表示,拉近泰老之间的语义距离;(2)在微调阶段,基于参数分化得到不同语言对特定的训练参数,缓解联合训练造成模型过度泛化的问题。实验结果表明在ALT数据集上,提出方法在泰-英和老-英两个翻译方向上,相比基线模型提升0.97和0.99个BLEU值。”
Anthology ID:
2022.ccl-1.28
Volume:
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Nanchang, China
Editors:
Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
305–316
Language:
Chinese
URL:
https://aclanthology.org/2022.ccl-1.28
DOI:
Bibkey:
Cite (ACL):
Zheng Shen, Cunli Mao, Zhengtao Yu, Shengxiang Gao, Linqin Wang, and Yuxin Huang. 2022. 融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 305–316, Nanchang, China. Chinese Information Processing Society of China.
Cite (Informal):
融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features) (Shen et al., CCL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ccl-1.28.pdf