Nicolas Kaiser
2020
A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation
Jan Deriu
|
Katsiaryna Mlynchyk
|
Philippe Schläpfer
|
Alvaro Rodrigo
|
Dirk von Grünigen
|
Nicolas Kaiser
|
Kurt Stockinger
|
Eneko Agirre
|
Mark Cieliebak
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.
Search
Fix data
Co-authors
- Eneko Agirre 1
- Mark Cieliebak 1
- Jan Milan Deriu 1
- Katsiaryna Mlynchyk 1
- Álvaro Rodrigo 1
- show all...
Venues
- acl1