基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model)

Xiaodong Yan (闫晓东), Xiaoqing Xie (解晓庆), Yu Zou (邹煜), Wei Li (李维)


Abstract
Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果,但在低资源语言的文本摘要研究还处于探索阶段,尤其是在藏语中。此外,目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后,采用两层双GRU神经网络提取代表原始新闻的句子,减少冗余信息。最后,使用基于注意力机制的Seq2Seq来生成理解式摘要。同时,我们加入了指针网络来处理未登录词的问题。实验结果表明,ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU
Anthology ID:
2020.ccl-1.44
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Editors:
Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
479–490
Language:
Chinese
URL:
https://aclanthology.org/2020.ccl-1.44
DOI:
Bibkey:
Cite (ACL):
Xiaodong Yan, Xiaoqing Xie, Yu Zou, and Wei Li. 2020. 基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 479–490, Haikou, China. Chinese Information Processing Society of China.
Cite (Informal):
基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model) (Yan et al., CCL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ccl-1.44.pdf