Olga Uryupina

2024

pdf bib abs
Multimodal Online Manipulation: Empirical Analysis of Fact-Checking Reports
Olga Uryupina
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This paper presents an in-depth exploratory quantitative study of the interaction between multimedia and textual components in online manipulative content. We discuss relations between content layers (such as proof or support) as well as unscrupulous techniques compromising visual content. The study is based on fakes reported and analyzed by PolitiFact and comprises documents from Facebook, Twitter and Instagram.

pdf bib abs
Life and Death of Fakes: On Data Persistence for Manipulative Social Media Content
Olga Uryupina
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This work presents an in-depth investigation of the data decay for publicly fact-checked online content. We monitor compromised posts on major social media platforms for one year, tracking the changes in their visibility and availability. We show that data persistence is an important issue for manipulative content, on a larger scale than previously reported for online content in general. Our finding also suggest the (much) higher data decay rate for the platforms suffering most from online disinformation, indicating an important area for data collection/preservation.

Modern automated dialog systems require complex dialog managers able to deal with user intent triggered by high-level semantic questions. In this paper, we propose a model for automatically clustering questions into user intents to help the design tasks. Since questions are short texts, uncovering their semantics to group them together can be very challenging. We approach the problem by using powerful semantic classifiers from question duplicate/matching research along with a novel idea of supervised clustering methods based on structured output. We test our approach on two intent clustering corpora, showing an impressive improvement over previous methods for two languages/domains.

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

2017

pdf bib abs
Collaborative Partitioning for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper presents a collaborative partitioning algorithm—a novel ensemble-based approach to coreference resolution. Starting from the all-singleton partition, we search for a solution close to the ensemble’s outputs in terms of a task-specific similarity measure. Our approach assumes a loose integration of individual components of the ensemble and can therefore combine arbitrary coreference resolvers, regardless of their models. Our experiments on the CoNLL dataset show that collaborative partitioning yields results superior to those attained by the individual components, for ensembles of both strong and weak systems. Moreover, by applying the collaborative partitioning algorithm on top of three state-of-the-art resolvers, we obtain the best coreference performance reported so far in the literature (MELA v08 score of 64.47).

2016

pdf bib abs
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Olga Uryupina | Ron Artstein | Antonella Bristot | Federica Cavicchio | Kepa Rodriguez | Massimo Poesio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.

2015

pdf bib
A State-of-the-Art Mention-Pair Model for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2014

pdf bib abs
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
Olga Uryupina | Barbara Plank | Aliaksei Severyn | Agata Rotondi | Alessandro Moschitti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present SenTube – a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details.

pdf bib
Opinion Mining on YouTube
Aliaksei Severyn | Alessandro Moschitti | Olga Uryupina | Barbara Plank | Katja Filippova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Multilingual Mention Detection for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Adapting a State-of-the-art Anaphora Resolution System for Resource-poor Language
Utpal Sikdar | Asif Ekbal | Sriparna Saha | Olga Uryupina | Massimo Poesio
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib abs
Domain-specific vs. Uniform Modeling for Coreference Resolution
Olga Uryupina | Massimo Poesio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Several corpora annotated for coreference have been made available in the past decade. These resources differ with respect to their size and the underlying structure: the number of domains and their similarity. Our study compares domain-specific models, learned from small heterogeneous subsets of the investigated corpora, against uniform models, that utilize all the available data. We show that for knowledge-poor baseline systems, domain-specific and uniform modeling yield same results. Systems, relying on large amounts of linguistic knowledge, however, exhibit differences in their performance: with all the designed features in use, domain-specific models suffer from over-fitting, whereas with pre-selected feature sets they tend to outperform union models.

pdf bib
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
Sameer Pradhan | Alessandro Moschitti | Nianwen Xue | Olga Uryupina | Yuchen Zhang
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
BART goes multilingual: The UniTN / Essex submission to the CoNLL-2012 Shared Task
Olga Uryupina | Alessandro Moschitti | Massimo Poesio
Joint Conference on EMNLP and CoNLL - Shared Task

2011

pdf bib
Single and multi-objective optimization for feature selection in anaphora resolution
Sriparna Saha | Asif Ekbal | Olga Uryupina | Massimo Poesio
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Multi-metric optimization for coreference: The UniTN / IITP / Essex submission to the 2011 CONLL Shared Task
Olga Uryupina | Sriparna Saha | Asif Ekbal | Massimo Poesio
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

pdf bib abs
Creating a Coreference Resolution System for Italian
Massimo Poesio | Olga Uryupina | Yannick Versley
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper summarizes our work on creating a full-scale coreference resolution (CR) system for Italian, using BART ― an open-source modular CR toolkit initially developed for English corpora. We discuss our experiments on language-specific issues of the task. As our evaluation experiments show, a language-agnostic system (designed primarily for English) can achieve a performance level in high forties (MUC F-score) when re-trained and tested on a new language, at least on gold mention boundaries. Compared to this level, we can improve our F-score by around 10% introducing a small number of language-specific changes. This shows that, with a modular coreference resolution platform, such as BART, one can straightforwardly develop a family of robust and reliable systems for various languages. We hope that our experiments will encourage researchers working on coreference in other languages to create their own full-scale coreference resolution systems ― as we have mentioned above, at the moment such modules exist only for very few languages other than English.

pdf bib
Corry: A System for Coreference Resolution
Olga Uryupina
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib abs
Error Analysis for Learning-based Coreference Resolution
Olga Uryupina
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

State-of-the-art coreference resolution engines show similar performance figures (low sixties on the MUC-7 data). Our system with a rich linguistically motivated feature set yields significantly better performance values for a variety of machine learners, but still leaves substantial room for improvement. In this paper we address a relatively unexplored area of coreference resolution - we present a detailed error analysis in order to understand the issues raised by corpus-based approaches to coreference resolution.

2006

pdf bib abs
Coreference Resolution with and without Linguistic Knowledge
Olga Uryupina
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

State-of-the-art statistical approaches to the Coreference Resolution task rely on sophisticated modeling, but very few (10-20) simple features. In this paper we propose to extend the standard feature set substantially, incorporating more linguistic knowledge. To investigate the usability of linguistically motivated features, we evaluate our system for a variety of machine learners on the standard dataset (MUC-7) with the traditional learning set-up.