VL-BERT+: Detecting Protected Groups in Hateful Multimodal Memes
Piush Aggarwal | Michelle Espranita Liman | Darina Gold | Torsten Zesch
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

This paper describes our submission (winning solution for Task A) to the Shared Task on Hateful Meme Detection at WOAH 2021. We build our system on top of a state-of-the-art system for binary hateful meme classification that already uses image tags such as race, gender, and web entities. We add further metadata such as emotions and experiment with data augmentation techniques, as hateful instances are underrepresented in the data set.


Decomposing and Comparing Meaning Relations: Paraphrasing, Textual Entailment, Contradiction, and Specificity
Venelin Kovatchev | Darina Gold | M. Antonia Marti | Maria Salamo | Torsten Zesch
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present a methodology for decomposing and comparing multiple meaning relations (paraphrasing, textual entailment, contradiction, and specificity). The methodology includes SHARel - a new typology that consists of 26 linguistic and 8 reason-based categories. We use the typology to annotate a corpus of 520 sentence pairs in English and we demonstrate that unlike previous typologies, SHARel can be applied to all relations of interest with a high inter-annotator agreement. We analyze and compare the frequency and distribution of the linguistic and reason-based phenomena involved in paraphrasing, textual entailment, contradiction, and specificity. This comparison allows for a much more in-depth analysis of the workings of the individual relations and the way they interact and compare with each other. We release all resources (typology, annotation guidelines, and annotated corpus) to the community.


Divide and Extract – Disentangling Clause Splitting and Proposition Extraction
Darina Gold | Torsten Zesch
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Proposition extraction from sentences is an important task for information extraction systems Evaluation of such systems usually conflates two aspects: splitting complex sentences into clauses and the extraction of propositions. It is thus difficult to independently determine the quality of the proposition extraction step. We create a manually annotated proposition dataset from sentences taken from restaurant reviews that distinguishes between clauses that need to be split and those that do not. The resulting proposition evaluation dataset allows us to independently compare the performance of proposition extraction systems on simple and complex clauses. Although performance drastically drops on more complex sentences, we show that the same systems perform best on both simple and complex clauses. Furthermore, we show that specific kinds of subordinate clauses pose difficulties to most systems.

RELATIONS - Workshop on meaning relations between phrases and sentences
Venelin Kovatchev | Darina Gold | Torsten Zesch
Annotating and analyzing the interactions between meaning relations
Darina Gold | Venelin Kovatchev | Torsten Zesch
Proceedings of the 13th Linguistic Annotation Workshop

Pairs of sentences, phrases, or other text pieces can hold semantic relations such as paraphrasing, textual entailment, contradiction, specificity, and semantic similarity. These relations are usually studied in isolation and no dataset exists where they can be compared empirically. Here we present a corpus annotated with these relations and the analysis of these results. The corpus contains 520 sentence pairs, annotated with these relations. We measure the annotation reliability of each individual relation and we examine their interactions and correlations. Among the unexpected results revealed by our analysis is that the traditionally considered direct relationship between paraphrasing and bi-directional entailment does not hold in our data.