Learning to Rank Semantic Coherence for Topic Segmentation

Liang Wang, Sujian Li, Yajuan Lv, Houfeng Wang


Abstract
Topic segmentation plays an important role for discourse parsing and information retrieval. Due to the absence of training data, previous work mainly adopts unsupervised methods to rank semantic coherence between paragraphs for topic segmentation. In this paper, we present an intuitive and simple idea to automatically create a “quasi” training dataset, which includes a large amount of text pairs from the same or different documents with different semantic coherence. With the training corpus, we design a symmetric CNN neural network to model text pairs and rank the semantic coherence within the learning to rank framework. Experiments show that our algorithm is able to achieve competitive performance over strong baselines on several real-world datasets.
Anthology ID:
D17-1139
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1340–1344
Language:
URL:
https://aclanthology.org/D17-1139/
DOI:
10.18653/v1/D17-1139
Bibkey:
Cite (ACL):
Liang Wang, Sujian Li, Yajuan Lv, and Houfeng Wang. 2017. Learning to Rank Semantic Coherence for Topic Segmentation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1340–1344, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Learning to Rank Semantic Coherence for Topic Segmentation (Wang et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1139.pdf