Paraphrastic Variance between European and Brazilian Portuguese

Anabela Barreiro, Cristina Mota


Abstract
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds “toda a gente” versus “todo o mundo” ‘everybody’ or the gerundive constructions [estar a + V-Inf] versus [ficar + V-Ger] (e.g., “estive a observar” | “fiquei observando” ‘I was observing’), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Anthology ID:
W18-3912
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
111–121
Language:
URL:
https://aclanthology.org/W18-3912
DOI:
Bibkey:
Cite (ACL):
Anabela Barreiro and Cristina Mota. 2018. Paraphrastic Variance between European and Brazilian Portuguese. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 111–121, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Paraphrastic Variance between European and Brazilian Portuguese (Barreiro & Mota, VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3912.pdf