Scalable Micro-planned Generation of Discourse from Structured Data

Anirban Laha; Parag Jain; Abhijit Mishra; Karthik Sankaranarayanan

doi:10.1162/coli_a_00363

Scalable Micro-planned Generation of Discourse from Structured Data

Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan

Abstract

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps.

Anthology ID:: J19-4005
Volume:: Computational Linguistics, Volume 45, Issue 4 - December 2019
Month:: December
Year:: 2019
Address:: Cambridge, MA
Venue:: CL
SIG:
Publisher:: MIT Press
Note:
Pages:: 737–763
Language:
URL:: https://aclanthology.org/J19-4005/
DOI:: 10.1162/coli_a_00363
Bibkey:
Cite (ACL):: Anirban Laha, Parag Jain, Abhijit Mishra, and Karthik Sankaranarayanan. 2019. Scalable Micro-planned Generation of Discourse from Structured Data. Computational Linguistics, 45(4):737–763.
Cite (Informal):: Scalable Micro-planned Generation of Discourse from Structured Data (Laha et al., CL 2019)
Copy Citation:
PDF:: https://aclanthology.org/J19-4005.pdf
Data: E2E, YAGO

PDF Cite Search Fix data