Patrícia Amaral
Also published as: Patricia Amaral
2025
Is It Still a Village? Tracing Grammaticalization with Word Embeddings
Joseph E. Larson | Patrícia Amaral
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Joseph E. Larson | Patrícia Amaral
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
It takes a village to grammaticalize
Joseph E. Larson | Patricia Amaral
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
Joseph E. Larson | Patricia Amaral
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
This paper investigates the grammaticalization of the noun caleta ‘cove, village’ to an inten- sifier, as part of the system of degree words in Chilean Spanish. We use word embeddings trained on a corpus of tweets to show the on- going syntactic and semantic change of caleta, while also revealing how high degree is ex- pressed in colloquial Chilean Spanish.
2021
BAHP: Benchmark of Assessing Word Embeddings in Historical Portuguese
Zuoyu Tian | Dylan Jarrett | Juan Escalona Torres | Patricia Amaral
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Zuoyu Tian | Dylan Jarrett | Juan Escalona Torres | Patricia Amaral
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
High quality distributional models can capture lexical and semantic relations between words. Hence, researchers design various intrinsic tasks to test whether such relations are captured. However, most of the intrinsic tasks are designed for modern languages, and there is a lack of evaluation methods for distributional models of historical corpora. In this paper, we conducted BAHP: a benchmark of assessing word embeddings in Historical Portuguese, which contains four types of tests: analogy, similarity, outlier detection, and coherence. We examined word2vec models generated from two historical Portuguese corpora in these four test sets. The results demonstrate that our test sets are capable of measuring the quality of vector space models and can provide a holistic view of the model’s ability to capture syntactic and semantic information. Furthermore, the methodology for the creation of our test sets can be easily extended to other historical languages.