Umesh Patil

2025

Quantifying word complexity for Leichte Sprache: A computational metric and its psycholinguistic validation
Umesh Patil | Jesus Calvillo | Sol Lago | Anne-Kathrin Schumann
Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL)

Leichte Sprache (Easy Language or Easy German) is a strongly simplified version of German geared toward a target group with limited language proficiency. In Germany, public bodies are required to provide information in Leichte Sprache. Unfortunately, Leichte Sprache rules are traditionally defined by non-linguists, they are not rooted in linguistic research and they do not provide precise decision criteria or devices for measuring the complexity of linguistic structures (Bock and Pappert,2023). For instance, one of the rules simply recommends the usage of simple rather than complex words. In this paper we, therefore, propose a model to determine word complexity. We train an XGBoost model for classifying word complexity by leveraging word-level linguistic and corpus-level distributional features, frequency information from an in-house Leichte Sprache corpus and human complexity annotations. We psycholinguistically validate our model by showing that it captures human word recognition times above and beyond traditional word-level predictors. Moreover, we discuss a number of practical applications of our classifier, such as the evaluation of AI-simplified text and detection of CEFR levels of words. To our knowledge, this is one of the first attempts to systematically quantify word complexity in the context of Leichte Sprache and to link it directly to real-time word processing.

pdf bib

Automatic Compound Segmentation for Leichte Sprache
Jesus Calvillo | Umesh Patil | Johann Seltmann | Anne-Kathrin Schumann
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops

2022

pdf bib abs

Computational cognitive modeling of predictive sentence processing in a second language
Umesh Patil | Sol Lago
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

We propose an ACT-R cue-based retrieval model of the real-time gender predictions displayed by second language (L2) learners. The model extends a previous model of native (L1) speakers according to two central accounts in L2 sentence processing: (i) the Interference Hypothesis, which proposes that retrieval interference is higher in L2 than L1 speakers; (ii) the Lexical Bottleneck Hypothesis, which proposes that problems with gender agreement are due to weak gender representations. We tested the predictions of these accounts using data from two visual world experiments, which found that the gender predictions elicited by German possessive pronouns were delayed and smaller in size in L2 than L1 speakers. The experiments also found a “match effect”, such that when the antecedent and possessee of the pronoun had the same gender, predictions were earlier than when the two genders differed. This match effect was smaller in L2 than L1 speakers. The model implementing the Lexical Bottleneck Hypothesis captured the effects of smaller predictions, smaller match effect and delayed predictions in one of the two conditions. By contrast, the model implementing the Interference Hypothesis captured the smaller prediction effect but it showed an earlier prediction effect and an increased match effect in L2 than L1 speakers. These results provide evidence for the Lexical Bottleneck Hypothesis, and they demonstrate a method for extending computational models of L1 to L2 processing.

2020

pdf bib abs

Demonstrative Pronouns as Anti-Logophoric Pronouns: An Experimental Investigation
Stefan Hinterwimmer | Andreas Brocher | Umesh Patil
Dialogue Discourse Volume 11

In this paper we report the results of two experimental studies in which we tested the claim of Hinterwimmer and Bosch (2017) that German demonstrative pronouns are anti-logophoric pronouns: They avoid discourse referents as antecedents that function as perspectival centers. In both experiments we tested the interpretative options of demonstrative pronouns in text segments which were either perspectivally neutral or in which the narrator’s or a topical protagonist’s perspective was foregrounded. Taken together, the experimental results are most compatible with a slightly modified version of the analysis argued for in Hinterwimmer and Bosch (2017) according to which topical discourse referents in neutral narration automatically become perspectival centers.

Co-authors

Johann Seltmann 1

Venues

Fix author