This study examines the influence of task type and healthy aging on various automatically extracted part-of-speech features in Icelandic. We administered three language tasks to participants aged 60–80: picture description, trip planning, and description of one’s childhood home. Our findings reveal significant task effects on 11 out of 14 linguistic variables studied, highlighting the substantial influence of sampling methods on language production. Among the variables showing statistically significant task effects, we find the rate of the genitive and subjunctive, variables which can only be studied in morphologically richer languages like Icelandic. On the other hand, rates of pronouns, adverbs, and prepositions remained stable across task types. Aging effects were more subtle, being evident in 3 of the 14 variables, including an interaction with task type for dative case marking. These findings underscore the significance of task selection in studies targeting linguistic features but also emphasize the need to examine languages other than English to fully understand the effects of aging on language production. Additionally, the results have clinical implications: understanding healthy aging’s impact on language can help us better identify and study changes caused by Alzheimer’s Disease in older adults’ speech.
Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.
We describe the structure and creation of the SageWrite corpus. This is a manually annotated corpus created to support automatic language generation and automatic quality assessment of academic articles. The corpus currently contains annotations for 100 excerpts taken from various scientific articles. For each of these excerpts, the corpus contains (i) a draft version of the excerpt (ii) annotations that reflect the stylistic and linguistics merits of the excerpt, such as whether or not the text is clearly structured. The SageWrite corpus is the first corpus for the fine-tuning of text-generation algorithms that specifically addresses academic writing.