SOM-NCSCM : An Efficient Neural Chinese Sentence Compression Model Enhanced with Self-Organizing Map

Kangli Zi, Shi Wang, Yu Liu, Jicun Li, Yanan Cao, Cungen Cao


Abstract
Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English. However, improvements on Chinese SC task are still quite few due to several difficulties: scarce of parallel corpora, different segmentation granularity of Chinese sentences, and imperfect performance of syntactic analyses. Furthermore, entire neural Chinese SC models have been under-investigated so far. In this work, we construct an SC dataset of Chinese colloquial sentences from a real-life question answering system in the telecommunication domain, and then, we propose a neural Chinese SC model enhanced with a Self-Organizing Map (SOM-NCSCM), to gain a valuable insight from the data and improve the performance of the whole neural Chinese SC model in a valid manner. Experimental results show that our SOM-NCSCM can significantly benefit from the deep investigation of similarity among data, and achieve a promising F1 score of 89.655 and BLEU4 score of 70.116, which also provides a baseline for further research on Chinese SC task.
Anthology ID:
2021.emnlp-main.33
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
403–415
Language:
URL:
https://aclanthology.org/2021.emnlp-main.33
DOI:
10.18653/v1/2021.emnlp-main.33
Bibkey:
Cite (ACL):
Kangli Zi, Shi Wang, Yu Liu, Jicun Li, Yanan Cao, and Cungen Cao. 2021. SOM-NCSCM : An Efficient Neural Chinese Sentence Compression Model Enhanced with Self-Organizing Map. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 403–415, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
SOM-NCSCM : An Efficient Neural Chinese Sentence Compression Model Enhanced with Self-Organizing Map (Zi et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.33.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.33.mp4