Corpus-based Content Construction

Balaji Vasan Srinivasan, Pranav Maneriker, Kundan Krishna, Natwar Modani


Abstract
Enterprise content writers are engaged in writing textual content for various purposes. Often, the text being written may already be present in the enterprise corpus in the form of past articles and can be re-purposed for the current needs. In the absence of suitable tools, authors manually curate/create such content (sometimes from scratch) which reduces their productivity. To address this, we propose an automatic approach to generate an initial version of the author’s intended text based on an input content snippet. Starting with a set of extracted textual fragments related to the snippet based on the query words in it, the proposed approach builds the desired text from these fragment by simultaneously optimizing the information coverage, relevance, diversity and coherence in the generated content. Evaluations on standard datasets shows improved performance against existing baselines on several metrics.
Anthology ID:
C18-1297
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3505–3515
Language:
URL:
https://aclanthology.org/C18-1297
DOI:
Bibkey:
Cite (ACL):
Balaji Vasan Srinivasan, Pranav Maneriker, Kundan Krishna, and Natwar Modani. 2018. Corpus-based Content Construction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3505–3515, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Corpus-based Content Construction (Srinivasan et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1297.pdf