Annotation of expressive dimensions on a multimodal French corpus of political interviews
Jules Cauzinille | Marc Evrard | Nikita Kiselov | Albert Rilliard
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences
We present a French corpus of political interviews labeled at the utterance level according to expressive dimensions such as Arousal. This corpus consists of 7.5 hours of high-quality audio-visual recordings with transcription. At the time of this publication, 1 hour of speech was segmented into short utterances, each manually annotated in Arousal. Our segmentation approach differs from similar corpora and allows us to perform an automatic Arousal prediction baseline by building a speech-based classification model. Although this paper focuses on the acoustic expression of Arousal, it paves the way for future work on conflictual and hostile expression recognition as well as multimodal architectures.
The automatic stance detection task consists in determining the attitude expressed in a text toward a target (text, claim, or entity). This is a typical intermediate task for the fake news detection or analysis, which is a considerably widespread and a particularly difficult issue to overcome. This work aims at the creation of a human-annotated corpus for the automatic stance detection of tweets written in French. It exploits a corpus of tweets collected during July and August 2018. To the best of our knowledge, this is the first freely available stance annotated tweet corpus in the French language. The four classes broadly adopted by the community were chosen for the annotation: support, deny, query, and comment with the addition of the ignore class. This paper presents the corpus along with the tools used to build it, its construction, an analysis of the inter-rater reliability, as well as the challenges and questions that were raised during the building process.