Leander Girrbach


2023

pdf bib
TüReuth Legal at SemEval-2023 Task 6: Modelling Local and Global Structure of Judgements for Rhetorical Role Prediction
Henrik Manegold | Leander Girrbach
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our system for SemEval-2023 Task 6: LegalEval: Understanding Legal Texts. We only participate in Sub-Task (A), Predicting Rhetorical Roles. Our final submission achieves 73.35 test set F1 score, ranking 17th of 27 participants. The proposed method combines global and local models of label distributions and transitions between labels. Through our analyses, we show that especially modelling the temporal distribution of labels contributes positively to performance.

pdf bib
Tü-CL at SIGMORPHON 2023: Straight-Through Gradient Estimation for Hard Attention
Leander Girrbach
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our systems participating in the 2023 SIGMORPHON Shared Task on Morphological Inflection and in the 2023 SIGMORPHON Shared Task on Interlinear Glossing. We propose methods to enrich predictions from neural models with discrete, i.e. interpretable, information. For morphological inflection, our models learn deterministic mappings from subsets of source lemma characters and morphological tags to individual target characters, which introduces interpretability. For interlinear glossing, our models learn a shallow morpheme segmentation in an unsupervised way jointly with predicting glossing lines. Estimated segmentation may be useful when no ground-truth segmentation is available. As both methods introduce discreteness into neural models, our technical contribution is to show that straight-through gradient estimators are effective to train hard attention models.

pdf bib
Tü-CL at SIGMORPHON 2023: Straight-Through Gradient Estimation for Hard Attention
Leander Girrbach
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our systems participating in the 2023 SIGMORPHON Shared Task on Morphological Inflection and in the 2023 SIGMORPHON Shared Task on Interlinear Glossing. We propose methods to enrich predictions from neural models with discrete, i.e. interpretable, information. For morphological inflection, our models learn deterministic mappings from subsets of source lemma characters and morphological tags to individual target characters, which introduces interpretability. For interlinear glossing, our models learn a shallow morpheme segmentation in an unsupervised way jointly with predicting glossing lines. Estimated segmentation may be useful when no ground-truth segmentation is available. As both methods introduce discreteness into neural models, our technical contribution is to show that straight-through gradient estimators are effective to train hard attention models.

pdf bib
SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion Submission Description: Sequence Labelling for G2P
Leander Girrbach
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our participation in the Third SIGMORPHON Shared Task on Grapheme-to-Phoneme Conversion (Low-Resource and Cross-Lingual) (McCarthy et al.,2022). Our models rely on different sequence labelling methods. The main model predicts multiple phonemes from each grapheme and is trained using CTC loss (Graves et al., 2006). We find that sequence labelling methods yield worse performance than the baseline when enough data is available, but can still be used when very little data is available. Furthermore, we demonstrate that alignments learned by the sequence labelling models can be easily inspected.

2022

pdf bib
Text Complexity DE Challenge 2022 Submission Description: Pairwise Regression for Complexity Prediction
Leander Girrbach
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

This paper describes our submission to the Text Complexity DE Challenge 2022 (Mohtaj et al., 2022). We evaluate a pairwise regression model that predicts the relative difference in complexity of two sentences, instead of predicting a complexity score from a single sentence. In consequence, the model returns samples of scores (as many as there are training sentences) instead of a point estimate. Due to an error in the submission, test set results are unavailable. However, we show by cross-validation that pairwise regression does not improve performance over standard regression models using sentence embeddings taken from pretrained language models as input. Furthermore, we do not find the distribution standard deviations to reflect differences in “uncertainty” of the model predictions in an useful way.

pdf bib
SIGMORPHON 2022 Shared Task on Morpheme Segmentation Submission Description: Sequence Labelling for Word-Level Morpheme Segmentation
Leander Girrbach
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We propose a sequence labelling approach to word-level morpheme segmentation. Segmentation labels are edit operations derived from a modified minimum edit distance alignment. We show that sequence labelling performs well for “shallow segmentation” and “canonical segmentation”, achieving 96.06 f1 score (macroaveraged over all languages in the shared task) and ranking 3rd among all participating teams. Therefore, we conclude that sequence labelling is a promising approach to morpheme segmentation.

pdf bib
SIGMORPHON 2022 Task 0 Submission Description: Modelling Morphological Inflection with Data-Driven and Rule-Based Approaches
Tatiana Merzhevich | Nkonye Gbadegoye | Leander Girrbach | Jingwen Li | Ryan Soh-Eun Shim
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our participation in the 2022 SIGMORPHON-UniMorph Shared Task on Typologically Diverse and AcquisitionInspired Morphological Inflection Generation. We present two approaches: one being a modification of the neural baseline encoderdecoder model, the other being hand-coded morphological analyzers using finite-state tools (FST) and outside linguistic knowledge. While our proposed modification of the baseline encoder-decoder model underperforms the baseline for almost all languages, the FST methods outperform other systems in the respective languages by a large margin. This confirms that purely data-driven approaches have not yet reached the maturity to replace trained linguists for documentation and analysis especially considering low-resource and endangered languages.