Simone Paolo Ponzetto

Also published as: Simone P. Ponzetto, Simone Ponzetto


2021

pdf bib
Masking and Transformer-based Models for Hyperpartisanship Detection in News
Javier Sánchez-Junquera | Paolo Rosso | Manuel Montes-y-Gómez | Simone Paolo Ponzetto
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Hyperpartisan news show an extreme manipulation of reality based on an underlying and extreme ideological orientation. Because of its harmful effects at reinforcing one’s bias and the posterior behavior of people, hyperpartisan news detection has become an important task for computational linguists. In this paper, we evaluate two different approaches to detect hyperpartisan news. First, a text masking technique that allows us to compare style vs. topic-related features in a different perspective from previous work. Second, the transformer-based models BERT, XLM-RoBERTa, and M-BERT, known for their ability to capture semantic and syntactic patterns in the same representation. Our results corroborate previous research on this task in that topic-related features yield better results than style-based ones, although they also highlight the relevance of using higher-length n-grams. Furthermore, they show that transformer-based models are more effective than traditional methods, but this at the cost of greater computational complexity and lack of transparency. Based on our experiments, we conclude that the beginning of the news show relevant information for the transformers at distinguishing effectively between left-wing, mainstream, and right-wing orientations.

pdf bib
FakeFlow: Fake News Detection by Modeling the Flow of Affective Information
Bilal Ghanem | Simone Paolo Ponzetto | Paolo Rosso | Francisco Rangel
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fake news articles often stir the readers’ attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers’ emotions. To capture this, we propose in this paper to model the flow of affective information in fake news articles using a neural architecture. The proposed model, FakeFlow, learns this flow by combining topic and affective information extracted from text. We evaluate the model’s performance with several experiments on four real-world datasets. The results show that FakeFlow achieves superior results when compared against state-of-the-art methods, thus confirming the importance of capturing the flow of the affective information in news articles.

pdf bib
DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces
Niklas Friedrich | Anne Lauscher | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Recent research efforts in NLP have demonstrated that distributional word vector spaces often encode stereotypical human biases, such as racism and sexism. With word representations ubiquitously used in NLP models and pipelines, this raises ethical issues and jeopardizes the fairness of language technologies. While there exists a large body of work on bias measures and debiasing methods, to date, there is no platform that would unify these research efforts and make bias measuring and debiasing of representation spaces widely accessible. In this work, we present DebIE, the first integrated platform for (1) measuring and (2) mitigating bias in word embeddings. Given an (i) embedding space (users can choose between the predefined spaces or upload their own) and (ii) a bias specification (users can choose between existing bias specifications or create their own), DebIE can (1) compute several measures of implicit and explicit bias and modify the embedding space by executing two (mutually composable) debiasing models. DebIE’s functionality can be accessed through four different interfaces: (a) a web application, (b) a desktop application, (c) a REST-ful API, and (d) as a command-line application. DebIE is available at: debie.informatik.uni-mannheim.de.

pdf bib
Come hither or go away? Recognising pre-electoral coalition signals in the news
Ines Rehbein | Simone Paolo Ponzetto | Anna Adendorf | Oke Bahnsen | Lukas Stoetzer | Heiner Stuckenschmidt
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce the task of political coalition signal prediction from text, that is, the task of recognizing from the news coverage leading up to an election the (un)willingness of political parties to form a government coalition. We decompose our problem into two related, but distinct tasks: (i) predicting whether a reported statement from a politician or a journalist refers to a potential coalition and (ii) predicting the polarity of the signal – namely, whether the speaker is in favour of or against the coalition. For this, we explore the benefits of multi-task learning and investigate which setup and task formulation is best suited for each sub-task. We evaluate our approach, based on hand-coded newspaper articles, covering elections in three countries (Ireland, Germany, Austria) and two languages (English, German). Our results show that the multi-task learning approach can further improve results over a strong monolingual transfer learning baseline.

2020

pdf bib
Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara Logacheva | Denis Teslenko | Artem Shelmanov | Steffen Remus | Dmitry Ustalov | Andrey Kutuzov | Ekaterina Artemova | Chris Biemann | Simone Paolo Ponzetto | Alexander Panchenko
Proceedings of the 12th Language Resources and Evaluation Conference

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al., (2018), enabling WSD in these languages. Models and system are available online.

pdf bib
SemEval-2020 Task 2: Predicting Multilingual and Cross-Lingual (Graded) Lexical Entailment
Goran Glavaš | Ivan Vulić | Anna Korhonen | Simone Paolo Ponzetto
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical entailment (LE) is a fundamental asymmetric lexico-semantic relation, supporting the hierarchies in lexical resources (e.g., WordNet, ConceptNet) and applications like natural language inference and taxonomy induction. Multilingual and cross-lingual NLP applications warrant models for LE detection that go beyond language boundaries. As part of SemEval 2020, we carried out a shared task (Task 2) on multilingual and cross-lingual LE. The shared task spans three dimensions: (1) monolingual vs. cross-lingual LE, (2) binary vs. graded LE, and (3) a set of 6 diverse languages (and 15 corresponding language pairs). We offered two different evaluation tracks: (a) Dist: for unsupervised, fully distributional models that capture LE solely on the basis of unannotated corpora, and (b) Any: for externally informed models, allowed to leverage any resources, including lexico-semantic networks (e.g., WordNet or BabelNet). In the Any track, we recieved runs that push state-of-the-art across all languages and language pairs, for both binary LE detection and graded LE prediction.

pdf bib
AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings
Anne Lauscher | Rafik Takieddin | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the Fifth Arabic Natural Language Processing Workshop

Recent work has shown that distributional word vector spaces often encode human biases like sexism or racism. In this work, we conduct an extensive analysis of biases in Arabic word embeddings by applying a range of recently introduced bias tests on a variety of embedding spaces induced from corpora in Arabic. We measure the presence of biases across several dimensions, namely: embedding models (Skip-Gram, CBOW, and FastText) and vector sizes, types of text (encyclopedic text, and news vs. user-generated content), dialects (Egyptian Arabic vs. Modern Standard Arabic), and time (diachronic analyses over corpora from different time periods). Our analysis yields several interesting findings, e.g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017). We make the Arabic bias specifications (AraWEAT) publicly available.

2019

pdf bib
Policy Preference Detection in Parliamentary Debate Motions
Gavin Abercrombie | Federico Nanni | Riza Batista-Navarro | Simone Paolo Ponzetto
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Debate motions (proposals) tabled in the UK Parliament contain information about the stated policy preferences of the Members of Parliament who propose them, and are key to the analysis of all subsequent speeches given in response to them. We attempt to automatically label debate motions with codes from a pre-existing coding scheme developed by political scientists for the annotation and analysis of political parties’ manifestos. We develop annotation guidelines for the task of applying these codes to debate motions at two levels of granularity and produce a dataset of manually labelled examples. We evaluate the annotation process and the reliability and utility of the labelling scheme, finding that inter-annotator agreement is comparable with that of other studies conducted on manifesto data. Moreover, we test a variety of ways of automatically labelling motions with the codes, ranging from similarity matching to neural classification methods, and evaluate them against the gold standard labels. From these experiments, we note that established supervised baselines are not always able to improve over simple lexical heuristics. At the same time, we detect a clear and evident benefit when employing BERT, a state-of-the-art deep language representation model, even in classification scenarios with over 30 different labels and limited amounts of training data.

pdf bib
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation

We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context embeddings and role labeling by combining these embeddings with syntactical features. A simple combination of these steps shows very competitive results and can be extended to process other datasets and languages.

pdf bib
Multilingual and Cross-Lingual Graded Lexical Entailment
Ivan Vulić | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Grounded in cognitive linguistics, graded lexical entailment (GR-LE) is concerned with fine-grained assertions regarding the directional hierarchical relationships between concepts on a continuous scale. In this paper, we present the first work on cross-lingual generalisation of GR-LE relation. Starting from HyperLex, the only available GR-LE dataset in English, we construct new monolingual GR-LE datasets for three other languages, and combine those to create a set of six cross-lingual GR-LE datasets termed CL-HYPERLEX. We next present a novel method dubbed CLEAR (Cross-Lingual Lexical Entailment Attract-Repel) for effectively capturing graded (and binary) LE, both monolingually in different languages as well as across languages (i.e., on CL-HYPERLEX). Coupled with a bilingual dictionary, CLEAR leverages taxonomic LE knowledge in a resource-rich language (e.g., English) and propagates it to other languages. Supported by cross-lingual LE transfer, CLEAR sets competitive baseline performance on three new monolingual GR-LE datasets and six cross-lingual GR-LE datasets. In addition, we show that CLEAR outperforms current state-of-the-art on binary cross-lingual LE detection by a wide margin for diverse language pairs.

pdf bib
Computational Analysis of Political Texts: Bridging Research Efforts Across Communities
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

In the last twenty years, political scientists started adopting and developing natural language processing (NLP) methods more actively in order to exploit text as an additional source of data in their analyses. Over the last decade the usage of computational methods for analysis of political texts has drastically expanded in scope, allowing for a sustained growth of the text-as-data community in political science. In political science, NLP methods have been extensively used for a number of analyses types and tasks, including inferring policy position of actors from textual evidence, detecting topics in political texts, and analyzing stylistic aspects of political texts (e.g., assessing the role of language ambiguity in framing the political agenda). Just like in numerous other domains, much of the work on computational analysis of political texts has been enabled and facilitated by the development of resources such as, the topically coded electoral programmes (e.g., the Manifesto Corpus) or topically coded legislative texts (e.g., the Comparative Agenda Project). Political scientists created resources and used available NLP methods to process textual data largely in isolation from the NLP community. At the same time, NLP researchers addressed closely related tasks such as election prediction, ideology classification, and stance detection. In other words, these two communities have been largely agnostic of one another, with NLP researchers mostly unaware of interesting applications in political science and political scientists not applying cutting-edge NLP methodology to their problems. The main goal of this tutorial is to systematize and analyze the body of research work on political texts from both communities. We aim to provide a gentle, all-round introduction to methods and tasks related to computational analysis of political texts. Our vision is to bring the two research communities closer to each other and contribute to faster and more significant developments in this interdisciplinary research area.

pdf bib
Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction
Dmitry Ustalov | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Computational Linguistics, Volume 45, Issue 3 - September 2019

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains. This algorithm creates an intermediate representation of the input graph, which reflects the “ambiguity” of its nodes. Then, it uses hard clustering to discover clusters in this “disambiguated” intermediate graph. After outlining the approach and analyzing its computational complexity, we demonstrate that Watset shows competitive results in three applications: unsupervised synset induction from a synonymy graph, unsupervised semantic frame induction from dependency triples, and unsupervised semantic class induction from a distributional thesaurus. Our algorithm is generic and can also be applied to other networks of linguistic data.

pdf bib
SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval
Fabian David Schmidt | Markus Dietsche | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We introduce Seagle, a platform for comparative evaluation of semantic text encoding models on information retrieval (IR) tasks. Seagle implements (1) word embedding aggregators, which represent texts as algebraic aggregations of pretrained word embeddings and (2) pretrained semantic encoders, and allows for their comparative evaluation on arbitrary (monolingual and cross-lingual) IR collections. We benchmark Seagle’s models on monolingual document retrieval and cross-lingual sentence retrieval. Seagle functionality can be exploited via an easy-to-use web interface and its modular backend (micro-service architecture) can easily be extended with additional semantic search models.

2018

pdf bib
Investigating the Role of Argumentation in the Rhetorical Analysis of Scientific Publications with Neural Multi-Task Learning Models
Anne Lauscher | Goran Glavaš | Simone Paolo Ponzetto | Kai Eckert
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Exponential growth in the number of scientific publications yields the need for effective automatic analysis of rhetorical aspects of scientific writing. Acknowledging the argumentative nature of scientific text, in this work we investigate the link between the argumentative structure of scientific publications and rhetorical aspects such as discourse categories or citation contexts. To this end, we (1) augment a corpus of scientific publications annotated with four layers of rhetoric annotations with argumentation annotations and (2) investigate neural multi-task learning architectures combining argument extraction with a set of rhetorical classification tasks. By coupling rhetorical classifiers with the extraction of argumentative components in a joint multi-task learning setting, we obtain significant performance gains for different rhetorical analysis tasks.

pdf bib
Unsupervised Semantic Frame Induction using Triclustering
Dmitry Ustalov | Alexander Panchenko | Andrey Kutuzov | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction. We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data. Our replicable benchmarks demonstrate that the proposed graph-based approach, Triframes, shows state-of-the art results on this task on a FrameNet-derived dataset and performing on par with competitive methods on a verb class clustering task.

pdf bib
Enriching Frame Representations with Distributionally Induced Senses
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Dmitry Ustalov | Denis Teslenko | Alexander Panchenko | Mikhail Chernoskutov | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Improving Hypernymy Extraction with Distributional Semantic Classes
Alexander Panchenko | Dmitry Ustalov | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
MIsA: Multilingual “IsA” Extraction from Corpora
Stefano Faralli | Els Lefever | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
CATS: A Tool for Customized Alignment of Text Simplification Corpora
Sanja Štajner | Marc Franco-Salvador | Paolo Rosso | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Argument-Annotated Corpus of Scientific Publications
Anne Lauscher | Goran Glavaš | Simone Paolo Ponzetto
Proceedings of the 5th Workshop on Argument Mining

Argumentation is an essential feature of scientific language. We present an annotation study resulting in a corpus of scientific publications annotated with argumentative components and relations. The argumentative annotations have been added to the existing Dr. Inventor Corpus, already annotated for four other rhetorical aspects. We analyze the annotated argumentative structures and investigate the relations between argumentation and other rhetorical aspects of scientific writing, such as discourse roles and citation contexts.

pdf bib
UniMa at SemEval-2018 Task 7: Semantic Relation Extraction and Classification from Scientific Publications
Thorsten Keiper | Zhonghao Lyu | Sara Pooladzadeh | Yuan Xu | Jingyi Zhang | Anne Lauscher | Simone Paolo Ponzetto
Proceedings of The 12th International Workshop on Semantic Evaluation

Large repositories of scientific literature call for the development of robust methods to extract information from scholarly papers. This problem is addressed by the SemEval 2018 Task 7 on extracting and classifying relations found within scientific publications. In this paper, we present a feature-based and a deep learning-based approach to the task and discuss the results of the system runs that we submitted for evaluation.

2017

pdf bib
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify its decisions in human-readable form.

pdf bib
The ContrastMedium Algorithm: Taxonomy Induction From Noisy Knowledge Graphs With Just A Few Links
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we present ContrastMedium, an algorithm that transforms noisy semantic networks into full-fledged, clean taxonomies. ContrastMedium is able to identify the embedded taxonomy structure from a noisy knowledge graph without explicit human supervision such as, for instance, a set of manually selected input root and leaf concepts. This is achieved by leveraging structural information from a companion reference taxonomy, to which the input knowledge graph is linked (either automatically or manually). When used in conjunction with methods for hypernym acquisition and knowledge base linking, our methodology provides a complete solution for end-to-end taxonomy induction. We conduct experiments using automatically acquired knowledge graphs, as well as a SemEval benchmark, and show that our method is able to achieve high performance on the task of taxonomy induction.

pdf bib
Improving Neural Knowledge Base Completion with Cross-Lingual Projections
Patrick Klein | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we present a cross-lingual extension of a neural tensor network model for knowledge base completion. We exploit multilingual synsets from BabelNet to translate English triples to other languages and then augment the reference knowledge base with cross-lingual triples. We project monolingual embeddings of different languages to a shared multilingual space and use them for network initialization (i.e., as initial concept embeddings). We then train the network with triples from the cross-lingually augmented knowledge base. Results on WordNet link prediction show that leveraging cross-lingual information yields significant gains over exploiting only monolingual triples.

pdf bib
Unsupervised Cross-Lingual Scaling of Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence that the output of these models correlates with common political dimensions like left-to-right orientation. Experimental results show that the semantically-informed scaling models better predict the party positions than the existing word-based models in two different political dimensions. Furthermore, the proposed models exhibit no drop in performance in the cross-lingual compared to monolingual setting.

pdf bib
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Alexander Panchenko | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.

pdf bib
Cross-Lingual Classification of Topics in Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Second Workshop on NLP and Computational Social Science

In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.

pdf bib
Effects of Lexical Properties on Viewing Time per Word in Autistic and Neurotypical Readers
Sanja Štajner | Victoria Yaneva | Ruslan Mitkov | Simone Paolo Ponzetto
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Eye tracking studies from the past few decades have shaped the way we think of word complexity and cognitive load: words that are long, rare and ambiguous are more difficult to read. However, online processing techniques have been scarcely applied to investigating the reading difficulties of people with autism and what vocabulary is challenging for them. We present parallel gaze data obtained from adult readers with autism and a control group of neurotypical readers and show that the former required higher cognitive effort to comprehend the texts as evidenced by three gaze-based measures. We divide all words into four classes based on their viewing times for both groups and investigate the relationship between longer viewing times and word length, word frequency, and four cognitively-based measures (word concreteness, familiarity, age of acquisition and imagability).

pdf bib
If Sentences Could See: Investigating Visual Information for Semantic Textual Similarity
Goran Glavaš | Ivan Vulić | Simone Paolo Ponzetto
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf bib
Dual Tensor Model for Detecting Asymmetric Lexico-Semantic Relations
Goran Glavaš | Simone Paolo Ponzetto
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Detection of lexico-semantic relations is one of the central tasks of computational semantics. Although some fundamental relations (e.g., hypernymy) are asymmetric, most existing models account for asymmetry only implicitly and use the same concept representations to support detection of symmetric and asymmetric relations alike. In this work, we propose the Dual Tensor model, a neural architecture with which we explicitly model the asymmetry and capture the translation between unspecialized and specialized word embeddings via a pair of tensors. Although our Dual Tensor model needs only unspecialized embeddings as input, our experiments on hypernymy and meronymy detection suggest that it can outperform more complex and resource-intensive models. We further demonstrate that the model can account for polysemy and that it exhibits stable performance across languages.

pdf bib
Topic-Based Agreement and Disagreement in US Electoral Manifestos
Stefano Menini | Federico Nanni | Simone Paolo Ponzetto | Sara Tonelli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering. Our approach outperforms both standard techniques like LDA and a state-of-the-art graph-based method, and provides promising initial results for this new task in computational social science.

pdf bib
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Alexander Panchenko | Fide Marten | Eugen Ruppert | Stefano Faralli | Dmitry Ustalov | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

pdf bib
Exploring Neural Text Simplification Models
Sergiu Nisioi | Sanja Štajner | Simone Paolo Ponzetto | Liviu P. Dinu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated TS systems, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve almost perfect grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems

pdf bib
Sentence Alignment Methods for Improving Text Simplification Systems
Sanja Štajner | Marc Franco-Salvador | Simone Paolo Ponzetto | Paolo Rosso | Heiner Stuckenschmidt
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.

2016

pdf bib
A Large DataBase of Hypernymy Relations Extracted from the Web.
Julian Seitner | Christian Bizer | Kai Eckert | Stefano Faralli | Robert Meusel | Heiko Paulheim | Simone Paolo Ponzetto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hypernymy relations (those where an hyponym term shares a “isa” relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

pdf bib
TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling
Alexander Panchenko | Stefano Faralli | Eugen Ruppert | Steffen Remus | Hubert Naets | Cédrick Fairon | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2015

pdf bib
Image with a Message: Towards Detecting Non-Literal Image Usages by Visual Linking
Lydia Weiland | Laura Dietz | Simone Paolo Ponzetto
Proceedings of the Fourth Workshop on Vision and Language

2014

pdf bib
Weakly supervised construction of a repository of iconic images
Lydia Weiland | Wolfgang Effelsberg | Simone Paolo Ponzetto
Proceedings of the Third Workshop on Vision and Language

pdf bib
DBpedia Domains: augmenting DBpedia with domain information
Gregor Titze | Volha Bryl | Cäcilia Zirn | Simone Paolo Ponzetto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an approach for augmenting DBpedia, a very large ontology lying at the heart of the Linked Open Data (LOD) cloud, with domain information. Our approach uses the thematic labels provided for DBpedia entities by Wikipedia categories, and groups them based on a kernel based k-means clustering algorithm. Experiments on gold-standard data show that our approach provides a first solution to the automatic annotation of DBpedia entities with domain labels, thus providing the largest LOD domain-annotated ontology to date.

2013

pdf bib
Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications
Simone Paolo Ponzetto | Andrea Zielinski
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)

2012

pdf bib
Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the ACL 2012 System Demonstrations

2010

pdf bib
Extending BART to Provide a Coreference Resolution System for German
Samuel Broscheit | Simone Paolo Ponzetto | Yannick Versley | Massimo Poesio
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a flexible toolkit-based approach to automatic coreference resolution on German text. We start with our previous work aimed at reimplementing the system from Soon et al. (2001) for English, and extend it to duplicate a version of the state-of-the-art proposal from Klenner and Ailloud (2009). Evaluation performed on a benchmarking dataset, namely the TueBa-D/Z corpus (Hinrichs et al., 2005b), shows that machine learning based coreference resolution can be robustly performed in a language other than English.

pdf bib
BabelNet: Building a Very Large Multilingual Semantic Network
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
Simone Paolo Ponzetto | Roberto Navigli
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification
Asif Ekbal | Eva Sourjikova | Anette Frank | Simone Paolo Ponzetto
Proceedings of the 2010 Named Entities Workshop

pdf bib
BART: A Multilingual Anaphora Resolution System
Samuel Broscheit | Massimo Poesio | Simone Paolo Ponzetto | Kepa Joseba Rodriguez | Lorenza Romano | Olga Uryupina | Yannick Versley | Roberto Zanoli
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-Occurrence Graphs
Carina Silberer | Simone Paolo Ponzetto
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
State-of-the-art NLP Approaches to Coreference Resolution: Theory and Practical Recipes
Simone Paolo Ponzetto | Massimo Poesio
Tutorial Abstracts of ACL-IJCNLP 2009

pdf bib
Extracting World and Linguistic Knowledge from Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

2008

pdf bib
BART: A Modular Toolkit for Coreference Resolution
Yannick Versley | Simone Paolo Ponzetto | Massimo Poesio | Vladimir Eidelman | Alan Jern | Jason Smith | Xiaofeng Yang | Alessandro Moschitti
Proceedings of the ACL-08: HLT Demo Session

pdf bib
BART: A modular toolkit for coreference resolution
Yannick Versley | Simone Ponzetto | Massimo Poesio | Vladimir Eidelman | Alan Jern | Jason Smith | Xiaofeng Yang | Alessandro Moschitti
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort. Accordingly, there is very limited availability of off-the shelf tools for researchers whose interests are not primarily in coreference or others who want to concentrate on a specific aspect of the problem. We present BART, a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation, the toolkit was used to extend a reimplementation of Soon et al.’s proposal with a variety of additional syntactic and knowledge-based features, and experiment with alternative resolution processes, preprocessing tools, and classifiers. BART has been released as open source software and is available from http://www.sfs.uni-tuebingen.de/~versley/BART

2007

pdf bib
Creating a Knowledge Base from a Collaboratively Generated Encyclopedia
Simone Paolo Ponzetto
Proceedings of the NAACL-HLT 2007 Doctoral Consortium

pdf bib
An API for Measuring the Relatedness of Words in Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Semantic Role Labeling for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Demonstrations

2005

pdf bib
Semantic Role Labeling Using Lexical Statistical Information
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)