PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model

Lucelene Lopes, Magali Duran, Paulo Fernandes, Thiago Pardo


Abstract
This paper presents PortiLexicon-UD, a large and freely available lexicon for Portuguese delivering morphosyntactic information according to the Universal Dependencies model. This lexical resource includes part of speech tags, lemmas, and morphological information for words, with 1,221,218 entries (considering word duplication due to different combination of PoS tag, lemma, and morphological features). We report the lexicon creation process, its computational data structure, and its evaluation over an annotated corpus, showing that it has a high language coverage and good quality data.
Anthology ID:
2022.lrec-1.715
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6635–6643
Language:
URL:
https://aclanthology.org/2022.lrec-1.715
DOI:
Bibkey:
Cite (ACL):
Lucelene Lopes, Magali Duran, Paulo Fernandes, and Thiago Pardo. 2022. PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6635–6643, Marseille, France. European Language Resources Association.
Cite (Informal):
PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model (Lopes et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.715.pdf