Elena Callegari


2024

pdf bib
Automatic Extraction of Language-Specific Biomarkers of Healthy Aging in Icelandic
Elena Callegari | Iris Edda Nowenstein | Ingunn Jóhanna Kristjánsdóttir | Anton Karl Ingason
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This study examines the influence of task type and healthy aging on various automatically extracted part-of-speech features in Icelandic. We administered three language tasks to participants aged 60–80: picture description, trip planning, and description of one’s childhood home. Our findings reveal significant task effects on 11 out of 14 linguistic variables studied, highlighting the substantial influence of sampling methods on language production. Among the variables showing statistically significant task effects, we find the rate of the genitive and subjunctive, variables which can only be studied in morphologically richer languages like Icelandic. On the other hand, rates of pronouns, adverbs, and prepositions remained stable across task types. Aging effects were more subtle, being evident in 3 of the 14 variables, including an interaction with task type for dative case marking. These findings underscore the significance of task selection in studies targeting linguistic features but also emphasize the need to examine languages other than English to fully understand the effects of aging on language production. Additionally, the results have clinical implications: understanding healthy aging’s impact on language can help us better identify and study changes caused by Alzheimer’s Disease in older adults’ speech.

2023

pdf bib
Enhancing Academic Title Generation Using SciBERT and Linguistic Rules
Elena Callegari | Peter Vajdecka | Desara Xhura | Anton Karl Ingason
Proceedings of the Second Workshop on Information Extraction from Scientific Publications

pdf bib
Predicting the presence of inline citations in academic text using binary classification
Peter Vajdecka | Elena Callegari | Desara Xhura | Atli Ásmundsson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.

2022

pdf bib
A corpus for Automatic Article Analysis
Elena Callegari | Desara Xhura
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

We describe the structure and creation of the SageWrite corpus. This is a manually annotated corpus created to support automatic language generation and automatic quality assessment of academic articles. The corpus currently contains annotations for 100 excerpts taken from various scientific articles. For each of these excerpts, the corpus contains (i) a draft version of the excerpt (ii) annotations that reflect the stylistic and linguistics merits of the excerpt, such as whether or not the text is clearly structured. The SageWrite corpus is the first corpus for the fine-tuning of text-generation algorithms that specifically addresses academic writing.