Olga Kellert
2023
Use of NLP in the Context of Belief states of Ethnic Minorities in Latin America
Olga Kellert
|
Mahmud Zaman
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
The major goal of our study is to test methods in NLP in the domain of health care education related to Covid-19 of vulnerable groups such as indigenous people from Latin America. In order to achieve this goal, we asked participants in a survey questionnaire to provide answers about health related topics. We used these answers to measure the health education status ofour participants. In this paper, we summarize the results from our NLP-application on the participants’ answers. In the first experiment, we use embeddings-based tools to measure the semantic similarity between participants’ answers and “expert” or “reference” answers. In the second experiment, we use synonym-based methods to classify answers under topics. We compare the results from both experiments with human annotations. Our results show that the tested NLP-methods reach a significantly lower accuracy score than human annotations in both experiments. We explain this difference by the assumption that human annotators are much better in pragmatic inferencing necessary to classify the semantic similarity and topic classification of answers.
2022
Social Context and User Profiles of Linguistic Variation on a Micro Scale
Olga Kellert
|
Nicholas Hill Matlis
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
This paper presents a new tweet-based approach in geolinguistic analysis which combines geolocation, user IDs and textual features in order to identify patterns of linguistic variation on a sub-city scale. Sub-city variations can be connected to social drivers and thus open new opportunities for understanding the mechanisms of language variation and change. However, measuring linguistic variation on these scales is challenging due to the lack of highly-spatially-resolved data as well as to the daily movement or users’ “mobility” inside cities which can obscure the relation between the social context and linguistic variation. Here we demonstrate how combining geolocation with user IDs and textual analysis of tweets can yield information about the linguistic profiles of the users, the social context associated with specific locations and their connection to linguistic variation. We apply our methodology to analyze dialects in Buenos Aires and find evidence of socially-driven variation. Our methods will contribute to the identification of sociolinguistic patterns inside cities, which are valuable in social sciences and social services.
Using neural topic models to track context shifts of words: a case study of COVID-related terms before and after the lockdown in April 2020
Olga Kellert
|
Md Mahmud Uz Zaman
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change
This paper explores lexical meaning changes in a new dataset, which includes tweets from before and after the COVID-related lockdown in April 2020. We use this dataset to evaluate traditional and more recent unsupervised approaches to lexical semantic change that make use of contextualized word representations based on the BERT neural language model to obtain representations of word usages. We argue that previous models that encode local representations of words cannot capture global context shifts such as the context shift of face masks since the pandemic outbreak. We experiment with neural topic models to track context shifts of words. We show that this approach can reveal textual associations of words that go beyond their lexical meaning representation. We discuss future work and how to proceed capturing the pragmatic aspect of meaning change as opposed to lexical semantic change.
Search