Simone Paolo Ponzetto - ACL Anthology

Simone Paolo Ponzetto

Also published as: Simone Ponzetto, Simone P. Ponzetto

2025

IRSum: One Model to Rule Summarization and Retrieval
Sotaro Takeshita | Simone Paolo Ponzetto | Kai Eckert
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

Applications that store a large number of documents often have summarization and retrieval functionalities to help users digest large amounts of information efficiently. Currently, such systems need to run two task-specific models, for summarization and retrieval, redundantly on the same set of documents. An efficient approach to amend this redundancy would be to reuse hidden representations produced during the summary generation for retrieval. However, our experiment shows that existing models, including recent large language models, do not produce retrieval-friendly embeddings during summarization due to a lack of a contrastive objective during their training. To this end, we introduce a simple, cost-effective training strategy which integrates a contrastive objective into standard summarization training without requiring additional annotations. We empirically show that our model can perform on par or even outperform in some cases compared to the combination of two task-specific models while improving throughput and FLOPs by up to 17% and 20%, respectively.

GenGO Ultra: an LLM-powered ACL Paper Explorer
Sotaro Takeshita | Tornike Tsereteli | Simone Paolo Ponzetto
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The ever-growing number of papers in natural language processing (NLP) poses the challenge of finding relevant papers. In our previous paper, we introduced GenGO, which complements NLP papers with various information, such as aspect-based summaries, to enable efficient paper exploration. While it delivers a better literature search experience, it lacks an interactive interface that dynamically produces information tailored to the user’s needs. To this end, we present an extension to our previous system, dubbed GenGO Ultra, which exploits large language models (LLMs) to dynamically generate responses grounded by published papers. We also conduct multi-granularity experiments to evaluate six text encoders and five LLMs. Our system is designed for transparency – based only on open-weight models, visible system prompts, and an open-source code base – to foster further development and research on top of our system: https://gengo-ultra.sotaro.io/

Moral reckoning: How reliable are dictionary-based methods for examining morality in text?
Ines Rehbein | Lilly Brauner | Florian Ertz | Ines Reinig | Simone Ponzetto
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Due to their availability and ease of use, dictionary-based measures of moral values are a popular tool for text-based analyses of morality that examine human attitudes and behaviour across populations and cultures. In this paper, we revisit the construct validity of different dictionary-based measures of morality in text that have been proposed in the literature. We discuss conceptual challenges for text-based measures of morality and present an annotation experiment where we create a new dataset with human annotations of moral rhetoric in German political manifestos. We compare the results of our human annotations with different measures of moral values, showing that none of them is able to capture the trends observed by trained human coders. Our findings have far-reaching implications for the application of moral dictionaries in the digital humanities.

Moral Framing in Politics (MFiP): A new resource and models for moral framing
Ines Rehbein | Ines Reinig | Simone Paolo Ponzetto
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The construct of morality permeates our entire lives and influences our behavior and how we perceive others. It therefore comes at no surprise that morality also plays an important role in politics, as morally framed arguments are perceived as more appealing and persuasive. Thus, being able to identify moral framing in political communication and to detect subtle differences in politicians’ moral framing can provide the basis for many interesting analyses in the political sciences. In the paper, we release MoralFramingInPolitics (MFiP), a new corpus of German parliamentary debates where the speakers’ moral framing has been coded, using the framework of Moral Foundations Theory (MFT). Our fine-grained annotations distinguish different types of moral frames and also include narrative roles, together with the moral foundations for each frame. We then present models for frame type and moral foundation classification and explore the benefits of data augmentation (DA) and contrastive learning (CL) for the two tasks. All data and code will be made available to the research community.

BabelEdits: A Benchmark and a Modular Approach for Robust Cross-lingual Knowledge Editing of Large Language Models
Tommaso Green | Félix Gaschi | Fabian David Schmidt | Simone Paolo Ponzetto | Goran Glavaš
Findings of the Association for Computational Linguistics: ACL 2025

With Large Language Models (LLMs) becoming increasingly multilingual, effective knowledge editing (KE) needs to propagate edits across languages. Evaluation of the existing methods for cross-lingual knowledge editing (CKE) is limited both w.r.t. edit effectiveness: benchmarks do not account for entity aliases and use faulty entity translations; as well as robustness: existing work fails to report on downstream generation and task-solving abilities of LLMs after editing. In this work, we aim to (i) maximize the effectiveness of CKE while at the same time (ii) minimizing the extent of downstream model collapse due to the edits. To accurately measure the effectiveness of CKE methods, we introduce BabelEdits, a new CKE benchmark covering 60 languages that combines high-quality multilingual synsets from BabelNet with marker-based translation to ensure entity translation quality. Unlike existing CKE benchmarks, BabelEdits accounts for the rich variety of entity aliases within and across languages. We then propose BabelReFT, a modular CKE approach based on representation fine-tuning (ReFT) which learns entity-scope ReFT modules, applying them to all multilingual aliases at inference. Our experimental results show that not only is BabelReFT more effective in CKE than state-of-the-art methods, but, owing to its modular design, much more robust against downstream model collapse when subjected to many sequential edits.

Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect
Alina Klerings | Jannik Brinkmann | Daniel Ruffinelli | Simone Paolo Ponzetto
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) are able to generate grammatically well-formed text, but how do they encode their syntactic knowledge internally? While prior work has focused largely on binary grammatical contrasts, in this work, we study the representation and control of two multidimensional hierarchical grammar phenomena—verb tense and aspect—and for each, identify distinct, orthogonal directions in residual space using linear discriminant analysis. Next, we demonstrate causal control over both grammatical features through concept steering across three generation tasks. Then, we use these identified features in a case study to investigate factors influencing effective steering in multi-token generation. We find that steering strength, location, and duration are crucial parameters for reducing undesirable side effects such as topic shift and degeneration. Our findings suggest that models encode tense and aspect in structurally organized, human-like ways, but effective control of such features during generation is sensitive to multiple factors and requires manual tuning or automated optimization.

Do LLMs Authentically Represent Affective Experiences of People with Disabilities on Social Media?
Marco Bombieri | Simone Paolo Ponzetto | Marco Rospocher
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition
Noel Chia | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Data Augmentation (DA) and Contrastive Learning (CL) are widely used in NLP, but their potential for NER has not yet been investigated in detail. Existing work is mostly limited to zero- and few-shot scenarios where improvements over the baseline are easy to obtain. In this paper, we address this research gap by presenting a systematic evaluation of DA for NER on small, medium-sized and large datasets with coarse and fine-grained labels. We report results for a) DA only, b) DA in combination with supervised contrastive learning, and c) DA with transfer learning. Our results show that DA on its own fails to improve results over the baseline and that supervised CL works better on larger datasets while transfer learning is beneficial if the target dataset is very small. Finally, we investigate how contrastive learning affects the learned representations, based on dimensionality reduction and visualisation techniques, and show that CL mostly helps to separate named entities from non-entities.

Randomly Removing 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks
Sotaro Takeshita | Yurina Takeshita | Daniel Ruffinelli | Simone Paolo Ponzetto
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In this paper, we study the surprising impact that truncating text embeddings has on downstream performance. We consistently observe across 6 state-of-the-art text encoders and 26 downstream tasks, that randomly removing up to 50% of embedding dimensions results in only a minor drop in performance, less than 10%, in retrieval and classification tasks. Given the benefits of using smaller-sized embeddings, as well as the potential insights about text encoding, we study this phenomenon and find that, contrary to what is suggested in prior work, this is not the result of an ineffective use of representation space. Instead, we find that a large number of uniformly distributed dimensions actually cause an increase in performance when removed. This would explain why, on average, removing a large number of embedding dimensions results in a marginal drop in performance. We make similar observations when truncating the embeddings used by large language models to make next-token predictions on generative tasks, suggesting that this phenomenon is not isolated to classification or retrieval tasks.

2024

Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers
Christopher Klamm | Gabriella Lapesa | Simone Paolo Ponzetto | Ines Rehbein | Indira Sen
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers

ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications
Sotaro Takeshita | Tommaso Green | Ines Reinig | Kai Eckert | Simone Ponzetto
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Extensive efforts in the past have been directed toward the development of summarization datasets. However, a predominant number of these resources have been (semi)-automatically generated, typically through web data crawling. This resulted in subpar resources for training and evaluating summarization systems, a quality compromise that is arguably due to the substantial costs associated with generating ground-truth summaries, particularly for diverse languages and specialized domains. To address this issue, we present ACLSum, a novel summarization dataset carefully crafted and evaluated by domain experts. In contrast to previous datasets, ACLSum facilitates multi-aspect summarization of scientific papers, covering challenges, approaches, and outcomes in depth. Through extensive experiments, we evaluate the quality of our resource and the performance of models based on pretrained language models (PLMs) and state-of-the-art large language models (LLMs). Additionally, we explore the effectiveness of extract-then-abstract versus abstractive end-to-end summarization within the scholarly domain on the basis of automatically discovered aspects. While the former performs comparably well to the end-to-end approach with pretrained language models regardless of the potential error propagation issue, the prompting-based approach with LLMs shows a limitation in extracting sentences from source documents.

A Survey on Modelling Morality for Text Analysis
Ines Reinig | Maria Becker | Ines Rehbein | Simone Ponzetto
Findings of the Association for Computational Linguistics: ACL 2024

In this survey, we provide a systematic review of recent work on modelling morality in text, an area of research that has garnered increasing attention in recent years. Our survey is motivated by the importance of modelling decisions on the created resources, the models trained on these resources and the analyses that result from the models’ predictions. We review work at the interface of NLP, Computational Social Science and Psychology and give an overview of the different goals and research questions addressed in the papers, their underlying theoretical backgrounds and the methods that have been applied to pursue these goals. We then identify and discuss challenges and research gaps, such as the lack of a theoretical framework underlying the operationalisation of morality in text, the low IAA reported for manyhuman-annotated resulting resources and the lack of validation of newly proposed resources and analyses.

How to Do Politics with Words: Investigating Speech Acts in Parliamentary Debates
Ines Reinig | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a new perspective on framing through the lens of speech acts and investigates how politicians make use of different pragmatic speech act functions in political debates. To that end, we created a new resource of German parliamentary debates, annotated with fine-grained speech act types. Our hierarchical annotation scheme distinguishes between cooperation and conflict communication, further structured into six subtypes, such as informative, declarative or argumentative-critical speech acts, with 14 fine-grained classes at the lowest level. We present classification baselines on our new data and show that the fine-grained classes in our schema can be predicted with an avg. F1 of around 82.0%. We then use our classifier to analyse the use of speech acts in a large corpus of parliamentary debates over a time span from 2003–2023.

Out of the Mouths of MPs: Speaker Attribution in Parliamentary Debates
Ines Rehbein | Josef Ruppenhofer | Annelen Brunner | Simone Paolo Ponzetto
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents GePaDe_SpkAtt , a new corpus for speaker attribution in German parliamentary debates, with more than 7,700 manually annotated events of speech, thought and writing. Our role inventory includes the sources, addressees, messages and topics of the speech event and also two additional roles, medium and evidence. We report baseline results for the automatic prediction of speech events and their roles, with high scores for both, event triggers and roles. Then we apply our model to predict speech events in 20 years of parliamentary debates and investigate the use of factives in the rhetoric of MPs.

GenGO: ACL Paper Explorer with Semantic Features
Sotaro Takeshita | Simone Ponzetto | Kai Eckert
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

We present GenGO, a system for exploring papers published in ACL conferences. Paper data stored in our database is enriched with multi-aspect summaries, extracted named entities, a field of study label, and text embeddings by our data processing pipeline. These metadata are used in our web-based user interface to enable researchers to quickly find papers relevant to their interests, and grasp an overview of papers without reading full-text of papers. To make GenGO to be available online as long as possible, we design GenGO to be simple and efficient to reduce maintenance and financial costs. In addition, the modularity of our data processing pipeline lets developers easily extend it to add new features. We make our code available to foster open development and transparency: https://gengo.sotaro.io.

A new Resource and Baselines for Opinion Role Labelling in German Parliamentary Debates
Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

Detecting opinions, their holders and targets in parliamentary debates provides an interesting layer of analysis, for example, to identify frequent targets of opinions for specific topics, actors or parties. In the paper, we present GePaDe-ORL, a new dataset for German parliamentary debates where subjective expressions, their opinion holders and targets have been annotated. We describe the annotation process and report baselines for predicting those annotations in our new dataset.

ROUGE-K: Do Your Summaries Have Keywords?
Sotaro Takeshita | Simone Ponzetto | Kai Eckert
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

Keywords, that is, content-relevant words in summaries play an important role in efficient information conveyance, making it critical to assess if system-generated summaries contain such informative words during evaluation. However, existing evaluation metrics for extreme summarization models do not pay explicit attention to keywords in summaries, leaving developers ignorant of their presence. To address this issue, we present a keyword-oriented evaluation metric, dubbed ROUGE-K, which provides a quantitative answer to the question of – How well do summaries include keywords? Through the lens of this keyword-aware metric, we surprisingly find that a current strong baseline model often misses essential information in their summaries. Our analysis reveals that human annotators indeed find the summaries with more keywords to be more relevant to the source documents. This is an important yet previously overlooked aspect in evaluating summarization systems. Finally, to enhance keyword inclusion, we propose four approaches for incorporating word importance into a transformer-based model and experimentally show that it enables guiding models to include more keywords while keeping the overall quality.

2023

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers
Chia-Chien Hung | Anne Lauscher | Dirk Hovy | Simone Paolo Ponzetto | Goran Glavaš
Findings of the Association for Computational Linguistics: EACL 2023

Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models. In this work, we investigate whether these previous findings still hold with state-of-the-art pretrained Transformer-based language models (PLMs). We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers (e.g., domain-specific or geographic knowledge). We adapt the language representations for the demographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling objectives with the prediction of demographic classes. Our results, when employing a multilingual PLM, show substantial gains in task performance across four languages (English, German, French, and Danish), which is consistent with the results of previous work. However, controlling for confounding factors – primarily domain and language proficiency of Transformer-based PLMs – shows that downstream performance gains from our demographic adaptation do not actually stem from demographic knowledge. Our results indicate that demographic specialization of PLMs, while holding promise for positive societal impact, still represents an unsolved problem for (modern) NLP.

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction
Yueling Li | Sebastian Martschat | Simone Paolo Ponzetto
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.

Massively Multilingual Lexical Specialization of Multilingual Transformers
Tommaso Green | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While pretrained language models (PLMs) primarily serve as general-purpose text encoders that can be fine-tuned for a wide variety of downstream tasks, recent work has shown that they can also be rewired to produce high-quality word representations (i.e., static word embeddings) and yield good performance in type-level lexical tasks. While existing work primarily focused on the lexical specialization of monolingual PLMs with immense quantities of monolingual constraints, in this work we expose massively multilingual transformers (MMTs, e.g., mBERT or XLM-R) to multilingual lexical knowledge at scale, leveraging BabelNet as the readily available rich source of multilingual and cross-lingual type-level lexical knowledge. Concretely, we use BabelNet’s multilingual synsets to create synonym pairs (or synonym-gloss pairs) across 50 languages and then subject the MMTs (mBERT and XLM-R) to a lexical specialization procedure guided by a contrastive objective. We show that such massively multilingual lexical specialization brings substantial gains in two standard cross-lingual lexical tasks, bilingual lexicon induction and cross-lingual word similarity, as well as in cross-lingual sentence retrieval. Crucially, we observe gains for languages unseen in specialization, indicating that multilingual lexical specialization enables generalization to languages with no lexical constraints. In a series of subsequent controlled experiments, we show that the number of specialization constraints plays a much greater role than the set of languages from which they originate.

Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences
Christopher Klamm | Gabriella Lapesa | Valentin Gold | Theresa Gessler | Simone Paolo Ponzetto
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

Policy Domain Prediction from Party Manifestos with Adapters and Knowledge Enhanced Transformers
Hsiao-Chu Yu | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection
Gretel De la Peña Sarracén | Paolo Rosso | Robert Litschko | Goran Glavaš | Simone Ponzetto
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results. However, the scarcity of resources in target languages remains a challenge. In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection. For data augmentation, we analyze two existing techniques based on vicinal risk minimization and propose MIXAG, a novel data augmentation method which interpolates pairs of instances based on the angle of their representations. Our experiments involve seven languages typologically distinct from English and three different domains. The results reveal that the data augmentation strategies can enhance few-shot cross-lingual abusive language detection. Specifically, we observe that consistently in all target languages, MIXAG improves significantly in multidomain and multilingual environments. Finally, we show through an error analysis how the domain adaptation can favour the class of abusive texts (reducing false negatives), but at the same time, declines the precision of the abusive language detection model.

Our kind of people? Detecting populist references in political debates
Christopher Klamm | Ines Rehbein | Simone Paolo Ponzetto
Findings of the Association for Computational Linguistics: EACL 2023

This paper investigates the identification of populist rhetoric in text and presents a novel cross-lingual dataset for this task. Our work is based on the definition of populism as a “communication style of political actors that refers to the people” but also includes anti-elitism as another core feature of populism. Accordingly, we annotate references to The People and The Elite in German and English parliamentary debates with a hierarchical scheme. The paper describes our dataset and annotation procedure and reports inter-annotator agreement for this task. Next, we compare and evaluate different transformer-based model architectures on a German dataset and report results for zero-shot learning on a smaller English dataset. We then show that semi-supervised tri-training can improve results in the cross-lingual setting. Our dataset can be used to investigate how political actors talk about The Elite and The People and to study how populist rhetoric is used as a strategic device.

2022

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
Chia-Chien Hung | Anne Lauscher | Ivan Vulić | Simone Ponzetto | Goran Glavaš
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English language, primarily due to the shortage of robust TOD datasets in other languages, preventing the systematic investigation of cross-lingual transfer for this crucial NLP application area. In this work, we introduce Multi2WOZ, a new multilingual multi-domain TOD dataset, derived from the well-established English dataset MultiWOZ, that spans four typologically diverse languages: Chinese, German, Arabic, and Russian. In contrast to concurrent efforts, Multi2WOZ contains gold-standard dialogs in target languages that are directly comparable with development and test portions of the English dataset, enabling reliable and comparative estimates of cross-lingual transfer performance for TOD. We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks. Using such conversational PrLMs specialized for concrete target languages, we systematically benchmark a number of zero-shot and few-shot cross-lingual transfer approaches on two standard TOD tasks: Dialog State Tracking and Response Retrieval. Our experiments show that, in most setups, the best performance entails the combination of (i) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task. Most importantly, we show that our conversational specialization in the target language allows for an exceptionally sample-efficient few-shot transfer for downstream TOD tasks.

Improved Opinion Role Labelling in Parliamentary Debates
Laura Bamberg | Ines Rehbein | Simone Ponzetto
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog
Chia-Chien Hung | Anne Lauscher | Simone Ponzetto | Goran Glavaš
Findings of the Association for Computational Linguistics: ACL 2022

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for TOD. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DomainCC and DomainReddit – resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters – additional parameter-light layers in which we encode the domain knowledge. Our experiments with prominent TOD tasks – dialog state tracking (DST) and response retrieval (RR) – encompassing five domains from the MultiWOZ benchmark demonstrate the effectiveness of DS-TOD. Moreover, we show that the light-weight adapter-based specialization (1) performs comparably to full fine-tuning in single domain setups and (2) is particularly suitable for multi-domain specialization, where besides advantageous computational footprint, it can offer better TOD performance.

FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics
Christopher Klamm | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

This paper presents a framework for studying second-level political agenda setting in parliamentary debates, based on the selection of policy topics used by political actors to discuss a specific issue on the parliamentary agenda. For example, the COVID-19 pandemic as an agenda item can be contextualised as a health issue or as a civil rights issue, as a matter of macroeconomics or can be discussed in the context of social welfare. Our framework allows us to observe differences regarding how different parties discuss the same agenda item by emphasizing different topical aspects of the item. We apply and evaluate our framework on data from the German Bundestag and discuss the merits and limitations of our approach. In addition, we present a new annotated data set of parliamentary debates, following the coding schema of policy topics developed in the Comparative Agendas Project (CAP), and release models for topic classification in parliamentary debates.

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System
Chia-Chien Hung | Tommaso Green | Robert Litschko | Tornike Tsereteli | Sotaro Takeshita | Marco Bombieri | Goran Glavaš | Simone Paolo Ponzetto
Proceedings of the Workshop on Multilingual Information Access (MIA)

This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Openretrieval Question Answering (COQA). In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate from them an answer in the language of the question. We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation. For passage retrieval, we evaluated the monolingual BM25 ranker against the ensemble of re-rankers based on multilingual pretrained language models (PLMs) and also variants of the shared task baseline, re-training it from scratch using a recently introduced contrastive loss that maintains a strong gradient signal throughout training by means of mixed negative samples. For answer generation, we focused on languageand domain-specialization by means of continued language model (LM) pretraining of existing multilingual encoders. Additionally, for both passage retrieval and answer generation, we augmented the training data provided by the task organizers with automatically generated question-answer pairs created from Wikipedia passages to mitigate the issue of data scarcity, particularly for the low-resource languages for which no training data were provided. Our results show that language- and domain-specialization as well as data augmentation help, especially for low-resource languages.

Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications
Tornike Tsereteli | Yavuz Selim Kartal | Simone Paolo Ponzetto | Andrea Zielinski | Kai Eckert | Philipp Mayr
Proceedings of the Third Workshop on Scholarly Document Processing

In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documents in full text. Two teams made a total of 9 submissions to the shared task leaderboard. While none of the teams improve on the baseline systems, we still draw insights from their submissions. Furthermore, we provide a detailed evaluation. Data and baselines for our shared task are freely available at https://github.com/vadis-project/sv-ident.

The Robotic Surgery Procedural Framebank
Marco Bombieri | Marco Rospocher | Simone Paolo Ponzetto | Paolo Fiorini
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Robot-Assisted minimally invasive robotic surgery is the gold standard for the surgical treatment of many pathological conditions, and several manuals and academic papers describe how to perform these interventions. These high-quality, often peer-reviewed texts are the main study resource for medical personnel and consequently contain essential procedural domain-specific knowledge. The procedural knowledge therein described could be extracted, e.g., on the basis of semantic parsing models, and used to develop clinical decision support systems or even automation methods for some procedure’s steps. However, natural language understanding algorithms such as, for instance, semantic role labelers have lower efficacy and coverage issues when applied to domain others than those they are typically trained on (i.e., newswire text). To overcome this problem, starting from PropBank frames, we propose a new linguistic resource specific to the robotic-surgery domain, named Robotic Surgery Procedural Framebank (RSPF). We extract from robotic-surgical texts verbs and nouns that describe surgical actions and extend PropBank frames by adding any of new lemmas, frames or role sets required to cover missing lemmas, specific frames describing the surgical significance, or new semantic roles used in procedural surgical language. Our resource is publicly available and can be used to annotate corpora in the surgical domain to train and evaluate Semantic Role Labeling (SRL) systems in a challenging fine-grained domain setting.

Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2
Ines Rehbein | Gabriella Lapesa | Goran Glavaš | Simone Paolo Ponzetto
Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2

Fair and Argumentative Language Modeling for Computational Argumentation
Carolin Holtermann | Anne Lauscher | Simone Ponzetto
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although much work in NLP has focused on measuring and mitigating stereotypical bias in semantic spaces, research addressing bias in computational argumentation is still in its infancy. In this paper, we address this research gap and conduct a thorough investigation of bias in argumentative language models. To this end, we introduce ABBA, a novel resource for bias measurement specifically tailored to argumentation. We employ our resource to assess the effect of argumentative fine-tuning and debiasing on the intrinsic bias found in transformer-based language models using a lightweight adapter-based approach that is more sustainable and parameter-efficient than full fine-tuning. Finally, we analyze the potential impact of language model debiasing on the performance in argument quality prediction, a downstream task of computational argumentation. Our results show that we are able to successfully and sustainably remove bias in general and argumentative language models while preserving (and sometimes improving) model performance in downstream tasks. We make all experimental code and data available at https://github.com/umanlp/FairArgumentativeLM.

2021

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces
Niklas Friedrich | Anne Lauscher | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Recent research efforts in NLP have demonstrated that distributional word vector spaces often encode stereotypical human biases, such as racism and sexism. With word representations ubiquitously used in NLP models and pipelines, this raises ethical issues and jeopardizes the fairness of language technologies. While there exists a large body of work on bias measures and debiasing methods, to date, there is no platform that would unify these research efforts and make bias measuring and debiasing of representation spaces widely accessible. In this work, we present DebIE, the first integrated platform for (1) measuring and (2) mitigating bias in word embeddings. Given an (i) embedding space (users can choose between the predefined spaces or upload their own) and (ii) a bias specification (users can choose between existing bias specifications or create their own), DebIE can (1) compute several measures of implicit and explicit bias and modify the embedding space by executing two (mutually composable) debiasing models. DebIE’s functionality can be accessed through four different interfaces: (a) a web application, (b) a desktop application, (c) a REST-ful API, and (d) as a command-line application. DebIE is available at: debie.informatik.uni-mannheim.de.

Come hither or go away? Recognising pre-electoral coalition signals in the news
Ines Rehbein | Simone Paolo Ponzetto | Anna Adendorf | Oke Bahnsen | Lukas Stoetzer | Heiner Stuckenschmidt
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce the task of political coalition signal prediction from text, that is, the task of recognizing from the news coverage leading up to an election the (un)willingness of political parties to form a government coalition. We decompose our problem into two related, but distinct tasks: (i) predicting whether a reported statement from a politician or a journalist refers to a potential coalition and (ii) predicting the polarity of the signal – namely, whether the speaker is in favour of or against the coalition. For this, we explore the benefits of multi-task learning and investigate which setup and task formulation is best suited for each sub-task. We evaluate our approach, based on hand-coded newspaper articles, covering elections in three countries (Ireland, Germany, Austria) and two languages (English, German). Our results show that the multi-task learning approach can further improve results over a strong monolingual transfer learning baseline.

FakeFlow: Fake News Detection by Modeling the Flow of Affective Information
Bilal Ghanem | Simone Paolo Ponzetto | Paolo Rosso | Francisco Rangel
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fake news articles often stir the readers’ attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers’ emotions. To capture this, we propose in this paper to model the flow of affective information in fake news articles using a neural architecture. The proposed model, FakeFlow, learns this flow by combining topic and affective information extracted from text. We evaluate the model’s performance with several experiments on four real-world datasets. The results show that FakeFlow achieves superior results when compared against state-of-the-art methods, thus confirming the importance of capturing the flow of the affective information in news articles.

Masking and Transformer-based Models for Hyperpartisanship Detection in News
Javier Sánchez-Junquera | Paolo Rosso | Manuel Montes-y-Gómez | Simone Paolo Ponzetto
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Hyperpartisan news show an extreme manipulation of reality based on an underlying and extreme ideological orientation. Because of its harmful effects at reinforcing one’s bias and the posterior behavior of people, hyperpartisan news detection has become an important task for computational linguists. In this paper, we evaluate two different approaches to detect hyperpartisan news. First, a text masking technique that allows us to compare style vs. topic-related features in a different perspective from previous work. Second, the transformer-based models BERT, XLM-RoBERTa, and M-BERT, known for their ability to capture semantic and syntactic patterns in the same representation. Our results corroborate previous research on this task in that topic-related features yield better results than style-based ones, although they also highlight the relevance of using higher-length n-grams. Furthermore, they show that transformer-based models are more effective than traditional methods, but this at the cost of greater computational complexity and lack of transparency. Based on our experiments, we conclude that the beginning of the news show relevant information for the transformers at distinguishing effectively between left-wing, mainstream, and right-wing orientations.

2020

SemEval-2020 Task 2: Predicting Multilingual and Cross-Lingual (Graded) Lexical Entailment
Goran Glavaš | Ivan Vulić | Anna Korhonen | Simone Paolo Ponzetto
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical entailment (LE) is a fundamental asymmetric lexico-semantic relation, supporting the hierarchies in lexical resources (e.g., WordNet, ConceptNet) and applications like natural language inference and taxonomy induction. Multilingual and cross-lingual NLP applications warrant models for LE detection that go beyond language boundaries. As part of SemEval 2020, we carried out a shared task (Task 2) on multilingual and cross-lingual LE. The shared task spans three dimensions: (1) monolingual vs. cross-lingual LE, (2) binary vs. graded LE, and (3) a set of 6 diverse languages (and 15 corresponding language pairs). We offered two different evaluation tracks: (a) Dist: for unsupervised, fully distributional models that capture LE solely on the basis of unannotated corpora, and (b) Any: for externally informed models, allowed to leverage any resources, including lexico-semantic networks (e.g., WordNet or BabelNet). In the Any track, we recieved runs that push state-of-the-art across all languages and language pairs, for both binary LE detection and graded LE prediction.

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings
Anne Lauscher | Rafik Takieddin | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the Fifth Arabic Natural Language Processing Workshop

Recent work has shown that distributional word vector spaces often encode human biases like sexism or racism. In this work, we conduct an extensive analysis of biases in Arabic word embeddings by applying a range of recently introduced bias tests on a variety of embedding spaces induced from corpora in Arabic. We measure the presence of biases across several dimensions, namely: embedding models (Skip-Gram, CBOW, and FastText) and vector sizes, types of text (encyclopedic text, and news vs. user-generated content), dialects (Egyptian Arabic vs. Modern Standard Arabic), and time (diachronic analyses over corpora from different time periods). Our analysis yields several interesting findings, e.g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017). We make the Arabic bias specifications (AraWEAT) publicly available.

Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara Logacheva | Denis Teslenko | Artem Shelmanov | Steffen Remus | Dmitry Ustalov | Andrey Kutuzov | Ekaterina Artemova | Chris Biemann | Simone Paolo Ponzetto | Alexander Panchenko
Proceedings of the Twelfth Language Resources and Evaluation Conference

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al., (2018), enabling WSD in these languages. Models and system are available online.

2019

SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval
Fabian David Schmidt | Markus Dietsche | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We introduce Seagle, a platform for comparative evaluation of semantic text encoding models on information retrieval (IR) tasks. Seagle implements (1) word embedding aggregators, which represent texts as algebraic aggregations of pretrained word embeddings and (2) pretrained semantic encoders, and allows for their comparative evaluation on arbitrary (monolingual and cross-lingual) IR collections. We benchmark Seagle’s models on monolingual document retrieval and cross-lingual sentence retrieval. Seagle functionality can be exploited via an easy-to-use web interface and its modular backend (micro-service architecture) can easily be extended with additional semantic search models.

WebIsAGraph: A Very Large Hypernymy Graph from a Web Corpus
Stefano Faralli | Irene Finocchi | Simone Paolo Ponzetto | Paola Velardi
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Multilingual and Cross-Lingual Graded Lexical Entailment
Ivan Vulić | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Grounded in cognitive linguistics, graded lexical entailment (GR-LE) is concerned with fine-grained assertions regarding the directional hierarchical relationships between concepts on a continuous scale. In this paper, we present the first work on cross-lingual generalisation of GR-LE relation. Starting from HyperLex, the only available GR-LE dataset in English, we construct new monolingual GR-LE datasets for three other languages, and combine those to create a set of six cross-lingual GR-LE datasets termed CL-HYPERLEX. We next present a novel method dubbed CLEAR (Cross-Lingual Lexical Entailment Attract-Repel) for effectively capturing graded (and binary) LE, both monolingually in different languages as well as across languages (i.e., on CL-HYPERLEX). Coupled with a bilingual dictionary, CLEAR leverages taxonomic LE knowledge in a resource-rich language (e.g., English) and propagates it to other languages. Supported by cross-lingual LE transfer, CLEAR sets competitive baseline performance on three new monolingual GR-LE datasets and six cross-lingual GR-LE datasets. In addition, we show that CLEAR outperforms current state-of-the-art on binary cross-lingual LE detection by a wide margin for diverse language pairs.

Computational Analysis of Political Texts: Bridging Research Efforts Across Communities
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

In the last twenty years, political scientists started adopting and developing natural language processing (NLP) methods more actively in order to exploit text as an additional source of data in their analyses. Over the last decade the usage of computational methods for analysis of political texts has drastically expanded in scope, allowing for a sustained growth of the text-as-data community in political science. In political science, NLP methods have been extensively used for a number of analyses types and tasks, including inferring policy position of actors from textual evidence, detecting topics in political texts, and analyzing stylistic aspects of political texts (e.g., assessing the role of language ambiguity in framing the political agenda). Just like in numerous other domains, much of the work on computational analysis of political texts has been enabled and facilitated by the development of resources such as, the topically coded electoral programmes (e.g., the Manifesto Corpus) or topically coded legislative texts (e.g., the Comparative Agenda Project). Political scientists created resources and used available NLP methods to process textual data largely in isolation from the NLP community. At the same time, NLP researchers addressed closely related tasks such as election prediction, ideology classification, and stance detection. In other words, these two communities have been largely agnostic of one another, with NLP researchers mostly unaware of interesting applications in political science and political scientists not applying cutting-edge NLP methodology to their problems. The main goal of this tutorial is to systematize and analyze the body of research work on political texts from both communities. We aim to provide a gentle, all-round introduction to methods and tasks related to computational analysis of political texts. Our vision is to bring the two research communities closer to each other and contribute to faster and more significant developments in this interdisciplinary research area.

Policy Preference Detection in Parliamentary Debate Motions
Gavin Abercrombie | Federico Nanni | Riza Batista-Navarro | Simone Paolo Ponzetto
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Debate motions (proposals) tabled in the UK Parliament contain information about the stated policy preferences of the Members of Parliament who propose them, and are key to the analysis of all subsequent speeches given in response to them. We attempt to automatically label debate motions with codes from a pre-existing coding scheme developed by political scientists for the annotation and analysis of political parties’ manifestos. We develop annotation guidelines for the task of applying these codes to debate motions at two levels of granularity and produce a dataset of manually labelled examples. We evaluate the annotation process and the reliability and utility of the labelling scheme, finding that inter-annotator agreement is comparable with that of other studies conducted on manifesto data. Moreover, we test a variety of ways of automatically labelling motions with the codes, ranging from similarity matching to neural classification methods, and evaluate them against the gold standard labels. From these experiments, we note that established supervised baselines are not always able to improve over simple lexical heuristics. At the same time, we detect a clear and evident benefit when employing BERT, a state-of-the-art deep language representation model, even in classification scenarios with over 30 different labels and limited amounts of training data.

Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction
Dmitry Ustalov | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Computational Linguistics, Volume 45, Issue 3 - September 2019

We present a detailed theoretical and computational analysis of the Watset meta-algorithm for fuzzy graph clustering, which has been found to be widely applicable in a variety of domains. This algorithm creates an intermediate representation of the input graph, which reflects the “ambiguity” of its nodes. Then, it uses hard clustering to discover clusters in this “disambiguated” intermediate graph. After outlining the approach and analyzing its computational complexity, we demonstrate that Watset shows competitive results in three applications: unsupervised synset induction from a synonymy graph, unsupervised semantic frame induction from dependency triples, and unsupervised semantic class induction from a distributional thesaurus. Our algorithm is generic and can also be applied to other networks of linguistic data.

HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
Saba Anwar | Dmitry Ustalov | Nikolay Arefyev | Simone Paolo Ponzetto | Chris Biemann | Alexander Panchenko
Proceedings of the 13th International Workshop on Semantic Evaluation

We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context embeddings and role labeling by combining these embeddings with syntactical features. A simple combination of these steps shows very competitive results and can be extended to process other datasets and languages.

2018

An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Dmitry Ustalov | Denis Teslenko | Alexander Panchenko | Mikhail Chernoskutov | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

CATS: A Tool for Customized Alignment of Text Simplification Corpora
Sanja Štajner | Marc Franco-Salvador | Paolo Rosso | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Improving Hypernymy Extraction with Distributional Semantic Classes
Alexander Panchenko | Dmitry Ustalov | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

MIsA: Multilingual “IsA” Extraction from Corpora
Stefano Faralli | Els Lefever | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Investigating the Role of Argumentation in the Rhetorical Analysis of Scientific Publications with Neural Multi-Task Learning Models
Anne Lauscher | Goran Glavaš | Simone Paolo Ponzetto | Kai Eckert
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Exponential growth in the number of scientific publications yields the need for effective automatic analysis of rhetorical aspects of scientific writing. Acknowledging the argumentative nature of scientific text, in this work we investigate the link between the argumentative structure of scientific publications and rhetorical aspects such as discourse categories or citation contexts. To this end, we (1) augment a corpus of scientific publications annotated with four layers of rhetoric annotations with argumentation annotations and (2) investigate neural multi-task learning architectures combining argument extraction with a set of rhetorical classification tasks. By coupling rhetorical classifiers with the extraction of argumentative components in a joint multi-task learning setting, we obtain significant performance gains for different rhetorical analysis tasks.

Unsupervised Semantic Frame Induction using Triclustering
Dmitry Ustalov | Alexander Panchenko | Andrey Kutuzov | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction. We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data. Our replicable benchmarks demonstrate that the proposed graph-based approach, Triframes, shows state-of-the art results on this task on a FrameNet-derived dataset and performing on par with competitive methods on a verb class clustering task.

Enriching Frame Representations with Distributionally Induced Senses
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

An Argument-Annotated Corpus of Scientific Publications
Anne Lauscher | Goran Glavaš | Simone Paolo Ponzetto
Proceedings of the 5th Workshop on Argument Mining

Argumentation is an essential feature of scientific language. We present an annotation study resulting in a corpus of scientific publications annotated with argumentative components and relations. The argumentative annotations have been added to the existing Dr. Inventor Corpus, already annotated for four other rhetorical aspects. We analyze the annotated argumentative structures and investigate the relations between argumentation and other rhetorical aspects of scientific writing, such as discourse roles and citation contexts.

UniMa at SemEval-2018 Task 7: Semantic Relation Extraction and Classification from Scientific Publications
Thorsten Keiper | Zhonghao Lyu | Sara Pooladzadeh | Yuan Xu | Jingyi Zhang | Anne Lauscher | Simone Paolo Ponzetto
Proceedings of the 12th International Workshop on Semantic Evaluation

Large repositories of scientific literature call for the development of robust methods to extract information from scholarly papers. This problem is addressed by the SemEval 2018 Task 7 on extracting and classifying relations found within scientific publications. In this paper, we present a feature-based and a deep learning-based approach to the task and discuss the results of the system runs that we submitted for evaluation.

2017

Topic-Based Agreement and Disagreement in US Electoral Manifestos
Stefano Menini | Federico Nanni | Simone Paolo Ponzetto | Sara Tonelli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering. Our approach outperforms both standard techniques like LDA and a state-of-the-art graph-based method, and provides promising initial results for this new task in computational social science.

Exploring Neural Text Simplification Models
Sergiu Nisioi | Sanja Štajner | Simone Paolo Ponzetto | Liviu P. Dinu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated TS systems, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve almost perfect grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems

Effects of Lexical Properties on Viewing Time per Word in Autistic and Neurotypical Readers
Sanja Štajner | Victoria Yaneva | Ruslan Mitkov | Simone Paolo Ponzetto
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Eye tracking studies from the past few decades have shaped the way we think of word complexity and cognitive load: words that are long, rare and ambiguous are more difficult to read. However, online processing techniques have been scarcely applied to investigating the reading difficulties of people with autism and what vocabulary is challenging for them. We present parallel gaze data obtained from adult readers with autism and a control group of neurotypical readers and show that the former required higher cognitive effort to comprehend the texts as evidenced by three gaze-based measures. We divide all words into four classes based on their viewing times for both groups and investigate the relationship between longer viewing times and word length, word frequency, and four cognitively-based measures (word concreteness, familiarity, age of acquisition and imagability).

If Sentences Could See: Investigating Visual Information for Semantic Textual Similarity
Goran Glavaš | Ivan Vulić | Simone Paolo Ponzetto
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

Cross-Lingual Classification of Topics in Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Second Workshop on NLP and Computational Social Science

In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.

Sentence Alignment Methods for Improving Text Simplification Systems
Sanja Štajner | Marc Franco-Salvador | Simone Paolo Ponzetto | Paolo Rosso | Heiner Stuckenschmidt
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Alexander Panchenko | Fide Marten | Eugen Ruppert | Stefano Faralli | Dmitry Ustalov | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

Improving Neural Knowledge Base Completion with Cross-Lingual Projections
Patrick Klein | Simone Paolo Ponzetto | Goran Glavaš
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we present a cross-lingual extension of a neural tensor network model for knowledge base completion. We exploit multilingual synsets from BabelNet to translate English triples to other languages and then augment the reference knowledge base with cross-lingual triples. We project monolingual embeddings of different languages to a shared multilingual space and use them for network initialization (i.e., as initial concept embeddings). We then train the network with triples from the cross-lingually augmented knowledge base. Results on WordNet link prediction show that leveraging cross-lingual information yields significant gains over exploiting only monolingual triples.

Domain-specific Named Entity Disambiguation in Historical Memoirs
Marco Rovera | Federico Nanni | Simone Paolo Ponzetto | Anna Goy
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Unsupervised Cross-Lingual Scaling of Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence that the output of these models correlates with common political dimensions like left-to-right orientation. Experimental results show that the semantically-informed scaling models better predict the party positions than the existing word-based models in two different political dimensions. Furthermore, the proposed models exhibit no drop in performance in the cross-lingual compared to monolingual setting.

The ContrastMedium Algorithm: Taxonomy Induction From Noisy Knowledge Graphs With Just A Few Links
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we present ContrastMedium, an algorithm that transforms noisy semantic networks into full-fledged, clean taxonomies. ContrastMedium is able to identify the embedded taxonomy structure from a noisy knowledge graph without explicit human supervision such as, for instance, a set of manually selected input root and leaf concepts. This is achieved by leveraging structural information from a companion reference taxonomy, to which the input knowledge graph is linked (either automatically or manually). When used in conjunction with methods for hypernym acquisition and knowledge base linking, our methodology provides a complete solution for end-to-end taxonomy induction. We conduct experiments using automatically acquired knowledge graphs, as well as a SemEval benchmark, and show that our method is able to achieve high performance on the task of taxonomy induction.

Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify its decisions in human-readable form.

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Alexander Panchenko | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.

Dual Tensor Model for Detecting Asymmetric Lexico-Semantic Relations
Goran Glavaš | Simone Paolo Ponzetto
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Detection of lexico-semantic relations is one of the central tasks of computational semantics. Although some fundamental relations (e.g., hypernymy) are asymmetric, most existing models account for asymmetry only implicitly and use the same concept representations to support detection of symmetric and asymmetric relations alike. In this work, we propose the Dual Tensor model, a neural architecture with which we explicitly model the asymmetry and capture the translation between unspecialized and specialized word embeddings via a pair of tensors. Although our Dual Tensor model needs only unspecialized embeddings as input, our experiments on hypernymy and meronymy detection suggest that it can outperform more complex and resource-intensive models. We further demonstrate that the model can account for polysemy and that it exhibits stable performance across languages.

2016

A Large DataBase of Hypernymy Relations Extracted from the Web.
Julian Seitner | Christian Bizer | Kai Eckert | Stefano Faralli | Robert Meusel | Heiko Paulheim | Simone Paolo Ponzetto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hypernymy relations (those where an hyponym term shares a “isa” relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

Unsupervised Text Segmentation Using Semantic Relatedness Graphs
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling
Alexander Panchenko | Stefano Faralli | Eugen Ruppert | Steffen Remus | Hubert Naets | Cédrick Fairon | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

Image with a Message: Towards Detecting Non-Literal Image Usages by Visual Linking
Lydia Weiland | Laura Dietz | Simone Paolo Ponzetto
Proceedings of the Fourth Workshop on Vision and Language

2014

DBpedia Domains: augmenting DBpedia with domain information
Gregor Titze | Volha Bryl | Cäcilia Zirn | Simone Paolo Ponzetto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an approach for augmenting DBpedia, a very large ontology lying at the heart of the Linked Open Data (LOD) cloud, with domain information. Our approach uses the thematic labels provided for DBpedia entities by Wikipedia categories, and groups them based on a kernel based k-means clustering algorithm. Experiments on gold-standard data show that our approach provides a first solution to the automatic annotation of DBpedia entities with domain labels, thus providing the largest LOD domain-annotated ontology to date.

Weakly supervised construction of a repository of iconic images
Lydia Weiland | Wolfgang Effelsberg | Simone Paolo Ponzetto
Proceedings of the Third Workshop on Vision and Language

2013

Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications
Simone Paolo Ponzetto | Andrea Zielinski
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)

2012

Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the ACL 2012 System Demonstrations

Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2010

BabelNet: Building a Very Large Multilingual Semantic Network
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

BART: A Multilingual Anaphora Resolution System
Samuel Broscheit | Massimo Poesio | Simone Paolo Ponzetto | Kepa Joseba Rodriguez | Lorenza Romano | Olga Uryupina | Yannick Versley | Roberto Zanoli
Proceedings of the 5th International Workshop on Semantic Evaluation

Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification
Asif Ekbal | Eva Sourjikova | Anette Frank | Simone Paolo Ponzetto
Proceedings of the 2010 Named Entities Workshop

Extending BART to Provide a Coreference Resolution System for German
Samuel Broscheit | Simone Paolo Ponzetto | Yannick Versley | Massimo Poesio
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a flexible toolkit-based approach to automatic coreference resolution on German text. We start with our previous work aimed at reimplementing the system from Soon et al. (2001) for English, and extend it to duplicate a version of the state-of-the-art proposal from Klenner and Ailloud (2009). Evaluation performed on a benchmarking dataset, namely the TueBa-D/Z corpus (Hinrichs et al., 2005b), shows that machine learning based coreference resolution can be robustly performed in a language other than English.

UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-Occurrence Graphs
Carina Silberer | Simone Paolo Ponzetto
Proceedings of the 5th International Workshop on Semantic Evaluation

Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
Simone Paolo Ponzetto | Roberto Navigli
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

Extracting World and Linguistic Knowledge from Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

State-of-the-art NLP Approaches to Coreference Resolution: Theory and Practical Recipes
Simone Paolo Ponzetto | Massimo Poesio
Tutorial Abstracts of ACL-IJCNLP 2009

2008

BART: A modular toolkit for coreference resolution
Yannick Versley | Simone Ponzetto | Massimo Poesio | Vladimir Eidelman | Alan Jern | Jason Smith | Xiaofeng Yang | Alessandro Moschitti
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort. Accordingly, there is very limited availability of off-the shelf tools for researchers whose interests are not primarily in coreference or others who want to concentrate on a specific aspect of the problem. We present BART, a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation, the toolkit was used to extend a reimplementation of Soon et al.s proposal with a variety of additional syntactic and knowledge-based features, and experiment with alternative resolution processes, preprocessing tools, and classifiers. BART has been released as open source software and is available from http://www.sfs.uni-tuebingen.de/~versley/BART

BART: A Modular Toolkit for Coreference Resolution
Yannick Versley | Simone Paolo Ponzetto | Massimo Poesio | Vladimir Eidelman | Alan Jern | Jason Smith | Xiaofeng Yang | Alessandro Moschitti
Proceedings of the ACL-08: HLT Demo Session

2007

Creating a Knowledge Base from a Collaboratively Generated Encyclopedia
Simone Paolo Ponzetto
Proceedings of the NAACL-HLT 2007 Doctoral Consortium

An API for Measuring the Relatedness of Words in Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

Semantic Role Labeling for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Demonstrations

2005

Semantic Role Labeling Using Lexical Statistical Information
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Co-authors

Anne Lauscher 9

Federico Nanni 7

Sotaro Takeshita 7

Dmitry Ustalov 7

Massimo Poesio 5

Michael Strube 5

Tommaso Green 4

Chia-Chien Hung 4

Christopher Klamm 4

Roberto Navigli 4

Eugen Ruppert 4

Yannick Versley 4

Sanja Štajner 4

Marco Bombieri 3

Gabriella Lapesa 3

Tornike Tsereteli 3

Samuel Broscheit 2

Vladimir Eidelman 2

Marc Franco-Salvador 2

Andrey Kutuzov 2

Robert Litschko 2

Alessandro Moschitti 2

Steffen Remus 2

Marco Rospocher 2

Daniel Ruffinelli 2

Fabian David Schmidt 2

Heiner Stuckenschmidt 2

Denis Teslenko 2

Lydia Weiland 2

Xiaofeng Yang 2

Andrea Zielinski 2

Gavin Abercrombie 1

Anna Adendorf 1

Nikolay Arefyev 1

Ekaterina Artemova 1

Laura Bamberg 1

Riza Theresa Batista-Navarro 1

Christian Bizer 1

Lilly Brauner 1

Jannik Brinkmann 1

Annelen Brunner 1

Mikhail Chernoskutov 1

Gretel De la Peña Sarracén 1

Markus Dietsche 1

Liviu P. Dinu 1

Wolfgang Effelsberg 1

Cédrick Fairon 1

Irene Finocchi 1

Paolo Fiorini 1

Niklas Friedrich 1

Theresa Gessler 1

Valentin Gold 1

Carolin Holtermann 1

Yavuz Selim Kartal 1

Thorsten Keiper 1

Patrick Klein 1

Alina Klerings 1

Anna Korhonen 1

Varvara Logacheva 1

Sebastian Martschat 1

Stefano Menini 1

Robert Meusel 1

Ruslan Mitkov 1

Manuel Montes 1

Sergiu Nisioi 1

Heiko Paulheim 1

Sara Pooladzadeh 1

Francisco Rangel 1

Kepa Joseba Rodriguez 1

Lorenza Romano 1

Josef Ruppenhofer 1

Julian Seitner 1

Artem Shelmanov 1

Carina Silberer 1

Eva Sourjikova 1

Lukas Stoetzer 1

Javier Sánchez-Junquera 1

Yurina Takeshita 1

Rafik Takieddin 1

Olga Uryupina 1

Paola Velardi 1

Victoria Yaneva 1

Roberto Zanoli 1

Cäcilia Zirn 1

Venues