IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies

Federica Cominetti, Lorenzo Gregori, Edoardo Lombardi Vallauri, Alessandro Panunzi


Abstract
The paper introduces the IMPAQTS corpus of Italian political discourse, a multimodal corpus of around 2.65 million tokens including 1,500 speeches uttered by 150 prominent politicians spanning from 1946 to 2023. Covering the entire history of the Italian Republic, the collection exhibits a non-homogeneous consistency that progressively increases in quantity towards the present. The corpus is balanced according to textual and socio-linguistic criteria and includes different types of speeches. The sociolinguistic features of the speakers are carefully considered to ensure representation of Republican Italian politicians. For each speaker, the corpus contains 4 parliamentary speeches, 2 rallies, 1 party assembly, and 3 statements (in person or broadcasted). Parliamentary speeches therefore constitute the largest section of the corpus (40% of the total), enabling direct comparison with other types of political speeches. The collection procedure, including details relevant to the transcription protocols, and the processing pipeline are described. The corpus has been pragmatically annotated to include information about the implicitly conveyed questionable contents, paired with their explicit paraphrasis, providing the largest Italian collection of ecologic examples of linguistic implicit strategies. The adopted ontology of linguistic implicitness and the fine-grained annotation scheme are presented in detail.
Anthology ID:
2024.parlaclarin-1.15
Volume:
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Darja Fiser, Maria Eskevich, David Bordon
Venues:
ParlaCLARIN | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
101–109
Language:
URL:
https://aclanthology.org/2024.parlaclarin-1.15
DOI:
Bibkey:
Cite (ACL):
Federica Cominetti, Lorenzo Gregori, Edoardo Lombardi Vallauri, and Alessandro Panunzi. 2024. IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies. In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 101–109, Torino, Italia. ELRA and ICCL.
Cite (Informal):
IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies (Cominetti et al., ParlaCLARIN-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.parlaclarin-1.15.pdf