Development of a Multilingual CCG Treebank via Universal Dependencies Conversion

Tu-Anh Tran, Yusuke Miyao


Abstract
This paper introduces an algorithm to convert Universal Dependencies (UD) treebanks to Combinatory Categorial Grammar (CCG) treebanks. As CCG encodes almost all grammatical information into the lexicon, obtaining a high-quality CCG derivation from a dependency tree is a challenging task. Our algorithm relies on hand-crafted rules to assign categories to constituents, and a non-statistical parser to derive full CCG parses given the assigned categories. To evaluate our converted treebanks, we perform lexical, sentential, and syntactic rule coverage analysis, as well as CCG parsing experiments. Finally, we discuss how our method handles complex constructions, and propose possible future extensions.
Anthology ID:
2022.lrec-1.560
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5220–5233
Language:
URL:
https://aclanthology.org/2022.lrec-1.560
DOI:
Bibkey:
Cite (ACL):
Tu-Anh Tran and Yusuke Miyao. 2022. Development of a Multilingual CCG Treebank via Universal Dependencies Conversion. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5220–5233, Marseille, France. European Language Resources Association.
Cite (Informal):
Development of a Multilingual CCG Treebank via Universal Dependencies Conversion (Tran & Miyao, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.560.pdf
Data
Penn TreebankUniversal Dependencies