Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs

Ana-Maria Barbu, Verginica Barbu Mititelu, Cătălin Mititelu


Abstract
We present here the efforts of aligning two language resources for Romanian: the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs: for each occurrence of those verbs in the treebank that were included as entries in the lexicon, a set of valence frames is automatically assigned, then manually validated by two linguists and, when necessary, corrected. Validating a valence frame also means semantically disambiguating the verb in the respective context. The validation is done by two linguists, on complementary datasets. However, a subset of verbs were validated by both annotators and Cohen’s κ is 0.87 for this subset. The alignment we have made also serves as a method of enhancing the quality of the two resources, as in the process we identify morpho-syntactic annotation mistakes, incomplete valence frames or missing ones. Information from each resource complements the information from the other, thus their value increases. The treebank and the lexicon are freely available, while the links discovered between them are also made available on GitHub.
Anthology ID:
2022.lrec-1.714
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6626–6634
Language:
URL:
https://aclanthology.org/2022.lrec-1.714
DOI:
Bibkey:
Cite (ACL):
Ana-Maria Barbu, Verginica Barbu Mititelu, and Cătălin Mititelu. 2022. Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6626–6634, Marseille, France. European Language Resources Association.
Cite (Informal):
Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs (Barbu et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.714.pdf
Data
Universal Dependencies