Costanza Navarretta - ACL Anthology

Costanza Navarretta

2024

Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech
Yana Nikolova | Costanza Navarretta
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper replicates and evaluates the word expansion (WE) method for sentiment lexicon generation from Rheault et al. (2016), applying it to two novel corpora of parliamentary speech from Denmark and Bulgaria. GloVe embeddings and vector similarity are leveraged to expand synonym seed lists with domain-specific terms from the speech corpora. The resulting Danish and Bulgarian lexica are compared to other multilingual lexica by analyzing a gold standard of speech excerpts annotated for sentiment. WE correlates best with hand-coded annotations for Danish, while a machine-translated Lexicoder dictionary does best for Bulgarian. WE performance is also found to be very sensitive to processing and scoring techniques, though this is also an issue with the other lexica. Overall, automatic lexicon translation best balances computational complexity and accuracy across both languages, but robust language-agnosticism remains elusive. Theoretical and practical problems of WE are discussed.

Government and Opposition in Danish Parliamentary Debates
Costanza Navarretta | Dorte Haltrup Hansen
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

In this paper, we address government and opposition speeches made by the Danish Parliament’s members from 2014 to 2022. We use the linguistic annotations and metadata in ParlaMint-DK, one of the ParlaMint corpora, to investigate some characteristics of the transcribed speeches made by government and opposition and test how well classifiers can identify the speeches delivered by these groups. Our analyses confirm that there are differences in the speeches made by government and opposition e.g., in the frequency of some modality expressions. In our study, we also include parties, which do not directly support or are against the government, the “other” group. The best performing classifier for identifying speeches made by parties in government, in opposition or in “other” is a transformer with a pre-trained Danish BERT model which gave an F1-score of 0.64. The same classifier obtained an F1-score of 0.77 on the binary identification of speeches made by government or opposition parties.

Multimodal Behaviour in an Online Environment: The GEHM Zoom Corpus Collection
Patrizia Paggio | Manex Agirrezabal | Costanza Navarretta | Leo Vitasovic
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper introduces a novel multimodal corpus consisting of 12 video recordings of Zoom meetings held in English by an international group of researchers from September 2021 to March 2023. The meetings have an average duration of about 40 minutes each, for a total of 8 hours. The number of participants varies from 5 to 9 per meeting. The participants’ speech was transcribed automatically using WhisperX, while visual coordinates of several keypoints of the participants’ head, their shoulders and wrists, were extracted using OpenPose. The audio-visual recordings will be distributed together with the orthographic transcription as well as the visual coordinates. In the paper we describe the way the corpus was collected, transcribed and enriched with the visual coordinates, we give descriptive statistics concerning both the speech transcription and the visual keypoint values and we present and discuss visualisations of these values. Finally, we carry out a short preliminary analysis of the role of feedback in the meetings, and show how visualising the coordinates extracted via OpenPose can be used to see how gestural behaviour supports the use of feedback words during the interaction.

2023

According to BERTopic, what do Danish Parties Debate on when they Address Energy and Environment?
Costanza Navarretta | Dorte H. Hansen
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

2022

Immigration in the Manifestos and Parliament Speeches of Danish Left and Right Wing Parties between 2009 and 2020
Costanza Navarretta | Dorte Haltrup Hansen | Bart Jongejan
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

The paper presents a study of how seven Danish left and right wing parties addressed immigration in their 2011, 2015 and 2019 manifestos and in their speeches in the Danish Parliament from 2009 to 2020. The annotated manifestos are produced by the Comparative Manifesto Project, while the parliamentary speeches annotated with policy areas (subjects) have been recently released under CLARIN-DK. In the paper, we investigate how often the seven parties addressed immigration in the manifestos and parliamentary debates, and we analyse both datasets after having applied NLP tools to them. A sentiment analysis tool was run on the manifestos and its results were compared with the manifestos’ annotations, while topic modeling was applied to the parliamentary speeches in order to outline central themes in the immigration debates. Many of the resulting topic groups are related to cultural, religious and integration aspects which were heavily debated by politicians and media when discussing immigration policy during the past decade. Our analyses also show differences and similarities between parties and indicate how the 2015 immigrant crisis is reflected in the two types of data. Finally, we discuss advantages and limitations of our quantitative and tool-based analyses.

The Subject Annotations of the Danish Parliament Corpus (2009-2017) - Evaluated with Automatic Multi-label Classification
Costanza Navarretta | Dorte Haltrup Hansen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper addresses the semi-automatic annotation of subjects, also called policy areas, in the Danish Parliament Corpus (2009-2017) v.2. Recently, the corpus has been made available through the CLARIN-DK repository, the Danish node of the European CLARIN infrastructure. The paper also contains an analysis of the subjects in the corpus, and a description of multi-label classification experiments act to verify the consistency of the subject annotation and the utility of the corpus for training classifiers on this type of data. The analysis of the corpus comprises an investigation of how often the parliament members addressed each subject and the relation between subjects and gender of the speaker. The classification experiments show that classifiers can determine the two co-occurring subjects of the speeches from the agenda titles with a performance similar to that of human annotators. Moreover, a multilayer perceptron achieved an F1-score of 0.68 on the same task when trained on bag of words vectors obtained from the speeches’ lemmas. This is an improvement of more than 0.6 with respect to the baseline, a majority classifier that accounts for the frequency of the classes. The result is promising given the high number of subject combinations (186) and the skewness of the data.

2021

Towards a Methodology Supporting Semiautomatic Annotation of HeadMovements in Video-recorded Conversations
Patrizia Paggio | Costanza Navarretta | Bart Jongejan | Manex Agirrezabal
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

We present a method to support the annotation of head movements in video-recorded conversations. Head movement segments from annotated multimodal data are used to train a model to detect head movements in unseen data. The resulting predicted movement sequences are uploaded to the ANVIL tool for post-annotation editing. The automatically identified head movements and the original annotations are compared to assess the overlap between the two. This analysis showed that movement onsets were more easily detected than offsets, and pointed at a number of patterns in the mismatches between original annotations and model predictions that could be dealt with in general terms in post-annotation guidelines.

2020

Creating a Corpus of Gestures and Predicting the Audience Response based on Gestures in Speeches of Donald Trump
Verena Ruf | Costanza Navarretta
Proceedings of the Twelfth Language Resources and Evaluation Conference

Gestures are an important component of non–verbal communication. This has an increasing potential in human–computer interaction. For example, Navarretta (2017b) uses sequences of speech and pauses together with co–speech gestures produced by Barack Obama in order to predict audience response, such as applause. The aim of this study is to explore the role of speech pauses and gestures alone as predictors of audience reaction without other types of speech information. For this work, we created a corpus of speeches held by Donald Trump before and during his time as president between 2016 and 2019. The data were transcribed with pause information and co–speech gestures were annotated as well as audience responses. Gestures and long silent pauses of the duration of at least 0.5 seconds are the input of computational models to predict audience reaction. The results of this study indicate that especially head movements and facial expressions play an important role and they confirm that gestures can to some extent be used to predict audience reaction independently of speech.

Automatic Detection and Classification of Head Movements in Face-to-Face Conversations
Patrizia Paggio | Manex Agirrezabal | Bart Jongejan | Costanza Navarretta
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, and the acoustic ones using Praat. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline as well as a more advanced baseline only relying on velocity features.

Identifying Parties in Manifestos and Parliament Speeches
Costanza Navarretta | Dorte Haltrup Hansen
Proceedings of the Second ParlaCLARIN Workshop

This paper addresses differences in the word use of two left-winged and two right-winged Danish parties, and how these differences reflecting some of the basic stances of the parties can be used to automatically identify the party of politicians from their speeches. In the first study, the most frequent and characteristic lemmas in the manifestos of the political parties are analysed. The analysis shows that the most frequently occurring lemmas in the manifestos reflect either the ideology or the position of the parties towards specific subjects, confirming for Danish preceding studies of English and German manifestos. Successively, we scaled our analysis applying machine learning on different language models built on the transcribed speeches by members of the same parties in the Parliament (Hansards) in order to determine to what extent it is possible to predict the party of the politicians from the speeches. The speeches used are a subset of the Danish Parliament corpus 2009–2017. The best models resulted in a weighted F1-score of 0.57. These results are significantly better than the results obtained by the majority classifier (F1-score = 0.11) and by chance results (0.25) and show that building language models over the speeches used by politicians can be used to identify the politicians’ party even if they debate about the same subjects and thus often use the same terminology in many cases. In the future, we will include the subject of the speeches in the prediction experiments

Dialogue Act Annotation in a Multimodal Corpus of First Encounter Dialogues
Costanza Navarretta | Patrizia Paggio
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper deals with the annotation of dialogue acts in a multimodal corpus of first encounter dialogues, i.e. face-to- face dialogues in which two people who meet for the first time talk with no particular purpose other than just talking. More specifically, we describe the method used to annotate dialogue acts in the corpus, including the evaluation of the annotations. Then, we present descriptive statistics of the annotation, particularly focusing on which dialogue acts often follow each other across speakers and which dialogue acts overlap with gestural behaviour. Finally, we discuss how feedback is expressed in the corpus by means of feedback dialogue acts with or without co-occurring gestural behaviour, i.e. multimodal vs. unimodal feedback.

2018

The Automatic Annotation of the Semiotic Type of Hand Gestures in Obama’ s Humorous Speeches
Costanza Navarretta
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Automatic identification of head movements in video-recorded conversations: can words help?
Patrizia Paggio | Costanza Navarretta | Bart Jongejan
Proceedings of the Sixth Workshop on Vision and Language

We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 68% with respect to these. The results also show that using jerk improves accuracy. We then conduct an investigation of the overlap between temporal sequences classified as either movement or non-movement and the speech stream of the person performing the gesture. The statistics derived from this analysis show that using word features may help increase the accuracy of the model.

2016

Mirroring Facial Expressions and Emotions in Dyadic Conversations
Costanza Navarretta
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents an investigation of mirroring facial expressions and the emotions which they convey in dyadic naturally occurring first encounters. Mirroring facial expressions are a common phenomenon in face-to-face interactions, and they are due to the mirror neuron system which has been found in both animals and humans. Researchers have proposed that the mirror neuron system is an important component behind many cognitive processes such as action learning and understanding the emotions of others. Preceding studies of the first encounters have shown that overlapping speech and overlapping facial expressions are very frequent. In this study, we want to determine whether the overlapping facial expressions are mirrored or are otherwise correlated in the encounters, and to what extent mirroring facial expressions convey the same emotion. The results of our study show that the majority of smiles and laughs, and one fifth of the occurrences of raised eyebrows are mirrored in the data. Moreover some facial traits in co-occurring expressions co-occur more often than it would be expected by chance. Finally, amusement, and to a lesser extent friendliness, are often emotions shared by both participants, while other emotions indicating individual affective states such as uncertainty and hesitancy are never showed by both participants, but co-occur with complementary emotions such as friendliness and support. Whether these tendencies are specific to this type of conversations or are more common should be investigated further.

2014

Transfer learning of feedback head expressions in Danish and Polish comparable multimodal corpora
Costanza Navarretta | Magdalena Lis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper is an investigation of the reusability of the annotations of head movements in a corpus in a language to predict the feedback functions of head movements in a comparable corpus in another language. The two corpora consist of naturally occurring triadic conversations in Danish and Polish, which were annotated according to the same scheme. The intersection of common annotation features was used in the experiments. A Naïve Bayes classifier was trained on the annotations of a corpus and tested on the annotations of the other corpus. Training and test datasets were then reversed and the experiments repeated. The results show that the classifier identifies more feedback behaviours than the majority baseline in both cases and the improvements are significant. The performance of the classifier decreases significantly compared with the results obtained when training and test data belong to the same corpus. Annotating multimodal data is resource consuming, thus the results are promising. However, they also confirm preceding studies that have identified both similarities and differences in the use of feedback head movements in different languages. Since our datasets are small and only regard a communicative behaviour in two languages, the experiments should be tested on more data types.

CLARA: A New Generation of Researchers in Common Language Resources and Their Applications
Koenraad De Smedt | Erhard Hinrichs | Detmar Meurers | Inguna Skadiņa | Bolette Pedersen | Costanza Navarretta | Núria Bel | Krister Lindén | Markéta Lopatková | Jan Hajič | Gisle Andersen | Przemyslaw Lenkiewicz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

CLARA (Common Language Resources and Their Applications) is a Marie Curie Initial Training Network which ran from 2009 until 2014 with the aim of providing researcher training in crucial areas related to language resources and infrastructure. The scope of the project was broad and included infrastructure design, lexical semantic modeling, domain modeling, multimedia and multimodal communication, applications, and parsing technologies and grammar models. An international consortium of 9 partners and 12 associate partners employed researchers in 19 new positions and organized a training program consisting of 10 thematic courses and summer/winter schools. The project has resulted in new theoretical insights as well as new resources and tools. Most importantly, the project has trained a new generation of researchers who can perform advanced research and development in language resources and technologies.

2013

Classifying Multimodal Turn Management in Danish Dyadic First Encounters
Costanza Navarretta | Patrizia Paggio
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

Feedback in Nordic First-Encounters: a Comparative Study
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common guidelines, and they have been annotated according to the same scheme in the NOMCO project. The results of the comparison show that in this data the most frequent feedback-related head movement is Nod in all three languages. Two types of Nods were distinguished in all corpora: Down-nods and Up-nods; the participants from the three countries use Down- and Up-nods with different frequency. In particular, Danes use Down-nods more frequently than Finns and Swedes, while Swedes use Up-nods more frequently than Finns and Danes. Finally, Finns use more often single Nods than repeated Nods, differing from the Swedish and Danish participants. The differences in the frequency of both Down-nods and Up-Nods in the Danish, Finnish and Swedish interactions are interesting given that Nordic countries are not only geographically near, but are also considered to be very similar culturally. Finally, a comparison of feedback-related words in the Danish and Swedish corpora shows that Swedes and Danes use common feedback words corresponding to yes and no with similar frequency.

Multimodal Behaviour and Feedback in Different Types of Interaction
Costanza Navarretta | Patrizia Paggio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this article, we compare feedback-related multimodal behaviours in two different types of interactions: first encounters between two participants who do not know each other in advance, and naturally-occurring conversations between two and three participants recorded at their homes. All participants are Danish native speakers. The interactions are transcribed using the same methodology, and the multimodal behaviours are annotated according to the same annotation scheme. In the study we focus on the most frequently occurring feedback expressions in the interactions and on feedback-related head movements and facial expressions. The analysis of the corpora, while confirming general facts about feedback-related head movements and facial expressions previously reported in the literature, also shows that the physical setting, the number of participants, the topics discussed, and the degree of familiarity influence the use of gesture types and the frequency of feedback-related expressions and gestures.

2011

Creating Comparable Multimodal Corpora for Nordic Languages
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

The NOMCO Multimodal Nordic Resource - Goals and Characteristics
Patrizia Paggio | Jens Allwood | Elisabeth Ahlsén | Kristiina Jokinen | Costanza Navarretta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as feedback, turn management and sequencing. They already include video material for Swedish, Danish, Finnish and Estonian, and several social activities are represented. The data will make it possible to verify empirically how gestures (head movements, facial displays, hand gestures and body postures) and speech interact in all the three mentioned aspects of communication. The data are being annotated following the MUMIN annotation scheme, which provides attributes concerning the shape and the communicative functions of head movements, face expressions, body posture and hand gestures. After having described the corpora, the paper discusses how they will be used to study the way feedback is expressed in speech and gestures, and reports results from two pilot studies where we investigated the function of head gestures ― both single and repeated ― in combination with feedback expressions. The annotated corpora will be valuable sources for research on intercultural communication as well as for interaction in the individual languages.

The DAD Parallel Corpora and their Uses
Costanza Navarretta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of these pronouns and their uses as abstract anaphora. Abstract anaphora have constructions such as verbal phrases, clauses and discourse segments as antecedents and refer to abstract objects comprising events, situations and propositions. The analysis of the annotated data shows the language specific characteristics of abstract anaphora in the two languages compared with the uses of abstract anaphora in English. Finally, the paper presents machine learning experiments run on the annotated data in order to identify the functions of third person singular neuter personal pronouns and neuter demonstrative pronouns. The results of these experiments vary from corpus to corpus. However, they are all comparable with the results obtained in similar tasks in other languages. This is very promising because the experiments have been run on both written and spoken data using a classification of the pronominal functions which is much more fine-grained than the classifications used in other studies.

Classification of Feedback Expressions in Multimodal Data
Costanza Navarretta | Patrizia Paggio
Proceedings of the ACL 2010 Conference Short Papers

2008

Annotating Abstract Pronominal Anaphora in the DAD Project
Costanza Navarretta | Sussi Olsen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio, 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The extended scheme, which we call the DAD annotation scheme, allows to annotate information about abstract anaphora which is important to investigate their use, see i.a. (Webber, 1988; Gundel et al., 2003; Navarretta, 2004; Navarretta, 2007) and which can influence their automatic treatment. Intercoder agreement scores obtained by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that the information proposed in the scheme can be recognised in a reliable way.

2006

The MULINCO corpus and corpus platform
Bente Maegaard | Lene Offersgaard | Lina Henriksen | Hanne Jansen | Xavier Lepetit | Costanza Navarretta | Claus Povlsen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The MULINCO project (MUltiLINgual Corpus of the University of Copenhagen) started early 2005. The purpose of this cross-disciplinary project is to create a corpus platform for education and research in monolingual and translation studies. The project covers two main types of corpus texts: literary and non-literary. The platform is being developed using available tools as far as possible, and integrating them in a very open architecture. In this paper we describe the current status and future developments of both the text and tool side of the corpus platform, and we show some examples of student exercises taking advantage of tagged and aligned texts.

2004

An Algorithm for Resolving Individual and Abstract Anaphora in Danish Texts and Dialogues
Costanza Navarretta
Proceedings of the Conference on Reference Resolution and Its Applications

Resolving Individual and Abstract Anaphora in Texts and Dialogues
Costanza Navarretta
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

“Human Language Technology Elements in a Knowledge Organisation System - The VID Project”
Costanza Navarretta | Bolette Sandford Pedersen | Dorte Haltrup Hansen
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper describes how Human Language Technologies and linguistic resources are used to support the construction of components of a knowledge organisation system. In particular we focus on methodologies and resources for building a corpus-based domain ontology and extracting relevant metadata information for text chunks from domain-specific corpora.

2001

Identifying Situation Reference in Danish
Costanza Navarretta
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)

2000

Abstract Anaphora Resolution in Danish
Costanza Navarretta
1st SIGdial Workshop on Discourse and Dialogue

Semantic Clustering of Adjectives and Verbs Based on Syntactic Patterns
Costanza Navarretta
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)

1998

An HPSG Marking Analysis of Danish Determiners and Clausal Adverbials
Costanza Navarretta | Anne Neville
Proceedings of the 11th Nordic Conference of Computational Linguistics (NODALIDA 1998)

Venues