ParlaMint-RO: Chamber of the Eternal Future
Petru Rebeja | Mădălina Chitez | Roxana Rogobete | Andreea Dincă | Loredana Bercuci
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference
The present paper aims to describe the collection of ParlaMint-RO corpus and to analyse several trends in parliamentary debates (plenary sessions of the Lower House) held in between 2000 and 2020). After a short description of the data collection (of existing transcripts), the workflow of data processing (text extraction, conversion, encoding, linguistic annotation), and an overview of the corpus, the paper will move on to a multi-layered linguistic analysis to validate interdisciplinary perspectives. We use computational methods and corpus linguistics approaches to scrutinize the future tense forms used by Romanian speakers, in order to create a data-supported profile of the parliamentary group strategies and planning.
In this paper we present the architecture, processing pipeline and results of the ensemble model developed for Romanian Dialect Identification task. The ensemble model consists of two TF-IDF encoders and a deep learning model aimed together at classifying input samples based on the writing patterns which are specific to each of the two dialects. Although the model performs well on the training set, its performance degrades heavily on the evaluation set. The drop in performance is due to the design decision which makes the model put too much weight on presence/lack of textual marks when determining the sample label.