The PALMA Corpora of African Varieties of Portuguese

Tjerk Hagemeijer, Amália Mendes, Rita Gonçalves, Catarina Cornejo, Raquel Madureira, Michel Généreux


Abstract
We present three new corpora of urban varieties of Portuguese spoken in Angola, Mozambique, and São Tomé and Príncipe, where Portuguese is increasingly being spoken as first and second language in different multilingual settings. Given the scarcity of linguistic resources available for the African varieties of Portuguese, these corpora provide new, contemporary data for the study of each variety and for comparative research on African, Brazilian and European varieties, hereby improving our understanding of processes of language variation and change in postcolonial societies. The corpora consist of transcribed spoken data, complemented by a rich set of metadata describing the setting of the audio recordings and sociolinguistic information about the speakers. They are annotated with POS and lemma information and made available on the CQPweb platform, which allows for sophisticated data searches. The corpora are already being used for comparative research on constructions in the domain of possession and location involving the argument structure of intransitive, monotransitive and ditransitive verbs that select Goals, Locatives, and Recipients.
Anthology ID:
2022.lrec-1.539
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5047–5053
Language:
URL:
https://aclanthology.org/2022.lrec-1.539
DOI:
Bibkey:
Cite (ACL):
Tjerk Hagemeijer, Amália Mendes, Rita Gonçalves, Catarina Cornejo, Raquel Madureira, and Michel Généreux. 2022. The PALMA Corpora of African Varieties of Portuguese. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5047–5053, Marseille, France. European Language Resources Association.
Cite (Informal):
The PALMA Corpora of African Varieties of Portuguese (Hagemeijer et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.539.pdf