Weakly Supervised Domain Detection

Yumo Xu, Mirella Lapata


Abstract
In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual segments that are domain-heavy (i.e., sentences or phrases that are representative of and provide evidence for a given domain) could enhance the robustness and portability of various text classification applications. We propose an encoder-detector framework for domain detection and bootstrap classifiers with multiple instance learning. The model is hierarchically organized and suited to multilabel classification. We demonstrate that despite learning with minimal supervision, our model can be applied to text spans of different granularities, languages, and genres. We also showcase the potential of domain detection for text summarization.
Anthology ID:
Q19-1037
Volume:
Transactions of the Association for Computational Linguistics, Volume 7
Month:
Year:
2019
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Brian Roark, Ani Nenkova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
581–596
Language:
URL:
https://aclanthology.org/Q19-1037
DOI:
10.1162/tacl_a_00287
Bibkey:
Cite (ACL):
Yumo Xu and Mirella Lapata. 2019. Weakly Supervised Domain Detection. Transactions of the Association for Computational Linguistics, 7:581–596.
Cite (Informal):
Weakly Supervised Domain Detection (Xu & Lapata, TACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/Q19-1037.pdf
Code
 yumoxu/detnet
Data
Wiki-enWiki-zh