Leander Girrbach

2025

Automatic concept extraction for learning domain modeling: A weakly supervised approach using contextualized word embeddings
Kordula De Kuthy | Leander Girrbach | Detmar Meurers
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Heterogeneity in student populations poses achallenge in formal education, with adaptivetextbooks offering a potential solution by tai-loring content based on individual learner mod-els. However, creating domain models for text-books typically demands significant manual ef-fort. Recent work by Chau et al. (2021) demon-strated automated concept extraction from dig-ital textbooks, but relied on costly domain-specific manual annotations. This paper in-troduces a novel, scalable method that mini-mizes manual effort by combining contextu-alized word embeddings with weakly super-vised machine learning. Our approach clustersword embeddings from textbooks and identi-fies domain-specific concepts using a machinelearner trained on concept seeds automaticallyextracted from Wikipedia. We evaluate thismethod using 28 economics textbooks, com-paring its performance against a tf-idf baseline,a supervised machine learning baseline, theRAKE keyword extraction method, and humandomain experts. Results demonstrate that ourweakly supervised method effectively balancesaccuracy with reduced annotation effort, offer-ing a practical solution for automated conceptextraction in adaptive learning environments.

2023

pdf bib abs

Tü-CL at SIGMORPHON 2023: Straight-Through Gradient Estimation for Hard Attention
Leander Girrbach
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our systems participating in the 2023 SIGMORPHON Shared Task on Morphological Inflection and in the 2023 SIGMORPHON Shared Task on Interlinear Glossing. We propose methods to enrich predictions from neural models with discrete, i.e. interpretable, information. For morphological inflection, our models learn deterministic mappings from subsets of source lemma characters and morphological tags to individual target characters, which introduces interpretability. For interlinear glossing, our models learn a shallow morpheme segmentation in an unsupervised way jointly with predicting glossing lines. Estimated segmentation may be useful when no ground-truth segmentation is available. As both methods introduce discreteness into neural models, our technical contribution is to show that straight-through gradient estimators are effective to train hard attention models.

pdf bib abs

SIGMORPHON 2022 Shared Task on Grapheme-to-Phoneme Conversion Submission Description: Sequence Labelling for G2P
Leander Girrbach
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our participation in the Third SIGMORPHON Shared Task on Grapheme-to-Phoneme Conversion (Low-Resource and Cross-Lingual) (McCarthy et al.,2022). Our models rely on different sequence labelling methods. The main model predicts multiple phonemes from each grapheme and is trained using CTC loss (Graves et al., 2006). We find that sequence labelling methods yield worse performance than the baseline when enough data is available, but can still be used when very little data is available. Furthermore, we demonstrate that alignments learned by the sequence labelling models can be easily inspected.

pdf bib abs

TüReuth Legal at SemEval-2023 Task 6: Modelling Local and Global Structure of Judgements for Rhetorical Role Prediction
Henrik Manegold | Leander Girrbach
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our system for SemEval-2023 Task 6: LegalEval: Understanding Legal Texts. We only participate in Sub-Task (A), Predicting Rhetorical Roles. Our final submission achieves 73.35 test set F1 score, ranking 17th of 27 participants. The proposed method combines global and local models of label distributions and transitions between labels. Through our analyses, we show that especially modelling the temporal distribution of labels contributes positively to performance.

2022

pdf bib abs

SIGMORPHON 2022 Shared Task on Morpheme Segmentation Submission Description: Sequence Labelling for Word-Level Morpheme Segmentation
Leander Girrbach
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We propose a sequence labelling approach to word-level morpheme segmentation. Segmentation labels are edit operations derived from a modified minimum edit distance alignment. We show that sequence labelling performs well for “shallow segmentation” and “canonical segmentation”, achieving 96.06 f1 score (macroaveraged over all languages in the shared task) and ranking 3rd among all participating teams. Therefore, we conclude that sequence labelling is a promising approach to morpheme segmentation.

pdf bib abs

SIGMORPHON 2022 Task 0 Submission Description: Modelling Morphological Inflection with Data-Driven and Rule-Based Approaches
Tatiana Merzhevich | Nkonye Gbadegoye | Leander Girrbach | Jingwen Li | Ryan Soh-Eun Shim
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our participation in the 2022 SIGMORPHON-UniMorph Shared Task on Typologically Diverse and AcquisitionInspired Morphological Inflection Generation. We present two approaches: one being a modification of the neural baseline encoderdecoder model, the other being hand-coded morphological analyzers using finite-state tools (FST) and outside linguistic knowledge. While our proposed modification of the baseline encoder-decoder model underperforms the baseline for almost all languages, the FST methods outperform other systems in the respective languages by a large margin. This confirms that purely data-driven approaches have not yet reached the maturity to replace trained linguists for documentation and analysis especially considering low-resource and endangered languages.

pdf bib abs

Text Complexity DE Challenge 2022 Submission Description: Pairwise Regression for Complexity Prediction
Leander Girrbach
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

This paper describes our submission to the Text Complexity DE Challenge 2022 (Mohtaj et al., 2022). We evaluate a pairwise regression model that predicts the relative difference in complexity of two sentences, instead of predicting a complexity score from a single sentence. In consequence, the model returns samples of scores (as many as there are training sentences) instead of a point estimate. Due to an error in the submission, test set results are unavailable. However, we show by cross-validation that pairwise regression does not improve performance over standard regression models using sentence embeddings taken from pretrained language models as input. Furthermore, we do not find the distribution standard deviations to reflect differences in “uncertainty” of the model predictions in an useful way.

Co-authors

Detmar Meurers 1

Ryan Soh-Eun Shim 1

Venues

Fix author