Olga Sozinova


2021

pdf bib
Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules
Tatyana Ruzsics | Olga Sozinova | Ximena Gutierrez-Vasques | Tanja Samardzic
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Neural models for morphological inflection have recently attained very high results. However, their interpretation remains challenging. Towards this goal, we propose a simple linguistically-motivated variant to the encoder-decoder model with attention. In our model, character-level cross-attention mechanism is complemented with a self-attention module over substrings of the input. We design a novel approach for pattern extraction from attention weights to interpret what the model learn. We apply our methodology to analyze the model’s decisions on three typologically-different languages and find that a) our pattern extraction method applied to cross-attention weights uncovers variation in form of inflection morphemes, b) pattern extraction from self-attention shows triggers for such variation, c) both types of patterns are closely aligned with grammar inflection classes and class assignment criteria, for all three languages. Additionally, we find that the proposed encoder attention component leads to consistent performance improvements over a strong baseline.

pdf bib
From characters to words: the turning point of BPE merges
Ximena Gutierrez-Vasques | Christian Bentz | Olga Sozinova | Tanja Samardzic
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The distributions of orthographic word types are very different across languages due to typological characteristics, different writing traditions and potentially other factors. The wide range of cross-linguistic diversity is still a major challenge for NLP and the study of language. We use BPE and information-theoretic measures to investigate if distributions become similar under specific levels of subword tokenization. We perform a cross-linguistic comparison, following incremental merges of BPE (we go from characters to words) for 47 diverse languages. We show that text entropy values (a feature of probability distributions) tend to converge at specific subword levels: relatively few BPE merges (around 350) lead to the most similar distributions across languages. Additionally, we analyze the interaction between subword and word-level distributions and show that our findings can be interpreted in light of the ongoing discussion regarding different types of morphological complexity.