Tatiana Bladier


2022

pdf bib
Improving Low-resource RRG Parsing with Cross-lingual Self-training
Kilian Evang | Laura Kallmeyer | Jakub Waszczuk | Kilu von Prince | Tatiana Bladier | Simon Petitjean
Proceedings of the 29th International Conference on Computational Linguistics

This paper considers the task of parsing low-resource languages in a scenario where parallel English data and also a limited seed of annotated sentences in the target language are available, as for example in bootstrapping parallel treebanks. We focus on constituency parsing using Role and Reference Grammar (RRG), a theory that has so far been understudied in computational linguistics but that is widely used in typological research, i.e., in particular in the context of low-resource languages. Starting from an existing RRG parser, we propose two strategies for low-resource parsing: first, we extend the parsing model into a cross-lingual parser, exploiting the parallel data in the high-resource language and unsupervised word alignments by providing internal states of the source-language parser to the target-language parser. Second, we adopt self-training, thereby iteratively expanding the training data, starting from the seed, by including the most confident new parses in each round. Both in simulated scenarios and with a real low-resource language (Daakaka), we find substantial and complementary improvements from both self-training and cross-lingual parsing. Moreover, we also experimented with using gloss embeddings in addition to token embeddings in the target language, and this also improves results. Finally, starting from what we have for Daakaka, we also consider parsing a related language (Dalkalaen) where glosses and English translations are available but no annotated trees at all, i.e., a no-resource scenario wrt. syntactic annotations. We start with cross-lingual parser trained on Daakaka with glosses and use self-training to adapt it to Dalkalaen. The results are surprisingly good.

pdf bib
RRGparbank: A Parallel Role and Reference Grammar Treebank
Tatiana Bladier | Kilian Evang | Valeria Generalova | Zahra Ghane | Laura Kallmeyer | Robin Möllemann | Natalia Moors | Rainer Osswald | Simon Petitjean
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the first release of RRGparbank, a multilingual parallel treebank for Role and Reference Grammar (RRG) containing annotations of George Orwell’s novel 1984 and its translations. The release comprises the entire novel for English and a constructionally diverse and highly parallel sample (“seed”) for German, French and Russian. The paper gives an overview of annotation decisions that have been taken and describes the adopted treebanking methodology. Finally, as a possible application, a multilingual parser is trained on the treebank data. RRGparbank is one of the first resources to apply RRG to large amounts of real-world data. Furthermore, it enables comparative and typological corpus studies in RRG. And, finally, it creates new possibilities of data-driven NLP applications based on RRG.

2021

pdf bib
Bootstrapping Role and Reference Grammar Treebanks via Universal Dependencies
Kilian Evang | Tatiana Bladier | Laura Kallmeyer | Simon Petitjean
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)

pdf bib
Improving DRS Parsing with Separately Predicted Semantic Roles
Tatiana Bladier | Gosse Minnema | Rik van Noord | Kilian Evang
Proceedings of the ESSLLI 2021 Workshop on Computing Semantics with Types, Frames and Related Structures

2020

pdf bib
Statistical Parsing of Tree Wrapping Grammars
Tatiana Bladier | Jakub Waszczuk | Laura Kallmeyer
Proceedings of the 28th International Conference on Computational Linguistics

We describe an approach to statistical parsing with Tree-Wrapping Grammars (TWG). TWG is a tree-rewriting formalism which includes the tree-combination operations of substitution, sister-adjunction and tree-wrapping substitution. TWGs can be extracted from constituency treebanks and aim at representing long distance dependencies (LDDs) in a linguistically adequate way. We present a parsing algorithm for TWGs based on neural supertagging and A* parsing. We extract a TWG for English from the treebanks for Role and Reference Grammar and discuss first parsing results with this grammar.

pdf bib
Automatic Extraction of Tree-Wrapping Grammars for Multiple Languages
Tatiana Bladier | Laura Kallmeyer | Rainer Osswald | Jakub Waszczuk
Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories

2018

pdf bib
German and French Neural Supertagging Experiments for LTAG Parsing
Tatiana Bladier | Andreas van Cranenburgh | Younes Samih | Laura Kallmeyer
Proceedings of ACL 2018, Student Research Workshop

We present ongoing work on data-driven parsing of German and French with Lexicalized Tree Adjoining Grammars. We use a supertagging approach combined with deep learning. We show the challenges of extracting LTAG supertags from the French Treebank, introduce the use of left- and right-sister-adjunction, present a neural architecture for the supertagger, and report experiments of n-best supertagging for French and German.

pdf bib
AET: Web-based Adjective Exploration Tool for German
Tatiana Bladier | Esther Seyffarth | Oliver Hellwig | Wiebke Petersen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)