Cross-genre Document Retrieval: Matching between Conversational and Formal Writings

Tomasz Jurczyk; Jinho D. Choi

doi:10.18653/v1/W17-5407

Cross-genre Document Retrieval: Matching between Conversational and Formal Writings

Abstract

This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.

Anthology ID:: W17-5407
Volume:: Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Editors:: Emily Bender, Hal Daumé III, Allyson Ettinger, Sudha Rao
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 48–53
Language:
URL:: https://aclanthology.org/W17-5407/
DOI:: 10.18653/v1/W17-5407
Bibkey:
Cite (ACL):: Tomasz Jurczyk and Jinho D. Choi. 2017. Cross-genre Document Retrieval: Matching between Conversational and Formal Writings. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pages 48–53, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Cross-genre Document Retrieval: Matching between Conversational and Formal Writings (Jurczyk & Choi, 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-5407.pdf

PDF Cite Search Fix data