VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization

Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran, Minh-Le Nguyen


Abstract
This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.
Anthology ID:
W16-5405
Volume:
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Koiti Hasida, Kam-Fai Wong, Nicoletta Calzorari, Key-Sun Choi
Venue:
ALR
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
38–48
Language:
URL:
https://aclanthology.org/W16-5405
DOI:
Bibkey:
Cite (ACL):
Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran, and Minh-Le Nguyen. 2016. VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 38–48, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization (Nguyen et al., ALR 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-5405.pdf