Compiling and Exploring a Portuguese Parliamentary Corpus: ParlaMint-PT

José Aires, Aida Cardoso, Rui Pereira, Amalia Mendes


Abstract
As part of the project ParlaMint II, a new corpus of the sessions of the Portuguese Parliament from 2015 to 2022 has been compiled, encoded and annotated following the ParlaMint guidelines. We report on the contents of the corpus and on the specific nature of the political settings in Portugal during the time period covered. Two subcorpora were designed that would enable comparisons of the political speeches between pre and post covid-19 pandemic. We discuss the pipeline applied to download the original texts, ensure their preprocessing and encoding in XML, and the final step of annotation. This new resource covers a period of changes in the political system in Portugal and will be an important source of data for political and social studies. Finally, Finally, we have explored the political stance on immigration in the ParlaMint-PT corpus.
Anthology ID:
2024.parlaclarin-1.2
Volume:
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Darja Fiser, Maria Eskevich, David Bordon
Venues:
ParlaCLARIN | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
12–20
Language:
URL:
https://aclanthology.org/2024.parlaclarin-1.2
DOI:
Bibkey:
Cite (ACL):
José Aires, Aida Cardoso, Rui Pereira, and Amalia Mendes. 2024. Compiling and Exploring a Portuguese Parliamentary Corpus: ParlaMint-PT. In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 12–20, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Compiling and Exploring a Portuguese Parliamentary Corpus: ParlaMint-PT (Aires et al., ParlaCLARIN-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.parlaclarin-1.2.pdf