Emmanuele Chersoni


pdf bib
NADE: A Benchmark for Robust Adverse Drug Events Extraction in Face of Negations
Simone Scaboro | Beatrice Portelli | Emmanuele Chersoni | Enrico Santus | Giuseppe Serra
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Adverse Drug Event (ADE) extraction models can rapidly examine large collections of social media texts, detecting mentions of drug-related adverse reactions and trigger medical investigations. However, despite the recent advances in NLP, it is currently unknown if such models are robust in face of negation, which is pervasive across language varieties. In this paper we evaluate three state-of-the-art systems, showing their fragility against negation, and then we introduce two possible strategies to increase the robustness of these models: a pipeline approach, relying on a specific component for negation detection; an augmentation of an ADE extraction dataset to artificially create negated samples and further train the models. We show that both strategies bring significant increases in performance, lowering the number of spurious entities predicted by the models. Our dataset and code will be publicly released to encourage research on the topic.

pdf bib
Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT
Won Ik Cho | Emmanuele Chersoni | Yu-Yin Hsu | Chu-Ren Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media
Shivam Raval | Hooman Sedghamiz | Enrico Santus | Tuka Alhanai | Mohammad Ghassemi | Emmanuele Chersoni
Findings of the Association for Computational Linguistics: EMNLP 2021

Adverse Events (AE) are harmful events resulting from the use of medical products. Although social media may be crucial for early AE detection, the sheer scale of this data makes it logistically intractable to analyze using human agents, with NLP representing the only low-cost and scalable alternative. In this paper, we frame AE Detection and Extraction as a sequence-to-sequence problem using the T5 model architecture and achieve strong performance improvements over the baselines on several English benchmarks (F1 = 0.71, 12.7% relative improvement for AE Detection; Strict F1 = 0.713, 12.4% relative improvement for AE Extraction). Motivated by the strong commonalities between AE tasks, the class imbalance in AE benchmarks, and the linguistic and structural variety typical of social media texts, we propose a new strategy for multi-task training that accounts, at the same time, for task and dataset characteristics. Our approach increases model robustness, leading to further performance gains. Finally, our framework shows some language transfer capabilities, obtaining higher performance than Multilingual BERT in zero-shot learning on French data.

pdf bib
BERT Prescriptions to Avoid Unwanted Headaches: A Comparison of Transformer Architectures for Adverse Drug Event Detection
Beatrice Portelli | Edoardo Lenzi | Emmanuele Chersoni | Giuseppe Serra | Enrico Santus
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Pretrained transformer-based models, such as BERT and its variants, have become a common choice to obtain state-of-the-art performances in NLP tasks. In the identification of Adverse Drug Events (ADE) from social media texts, for example, BERT architectures rank first in the leaderboard. However, a systematic comparison between these models has not yet been done. In this paper, we aim at shedding light on the differences between their performance analyzing the results of 12 models, tested on two standard benchmarks. SpanBERT and PubMedBERT emerged as the best models in our evaluation: this result clearly shows that span-based pretraining gives a decisive advantage in the precise recognition of ADEs, and that in-domain language pretraining is particularly useful when the transformer model is trained just on biomedical text from scratch.

pdf bib
Is Domain Adaptation Worth Your Investment? Comparing BERT and FinBERT on Financial Tasks
Bo Peng | Emmanuele Chersoni | Yu-Yin Hsu | Chu-Ren Huang
Proceedings of the Third Workshop on Economics and Natural Language Processing

With the recent rise in popularity of Transformer models in Natural Language Processing, research efforts have been dedicated to the development of domain-adapted versions of BERT-like architectures. In this study, we focus on FinBERT, a Transformer model trained on text from the financial domain. By comparing its performances with the original BERT on a wide variety of financial text processing tasks, we found continual pretraining from the original model to be the more beneficial option. Domain-specific pretraining from scratch, conversely, seems to be less effective.

pdf bib
PolyU CBS-Comp at SemEval-2021 Task 1: Lexical Complexity Prediction (LCP)
Rong Xiang | Jinghang Gu | Emmanuele Chersoni | Wenjie Li | Qin Lu | Chu-Ren Huang
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this contribution, we describe the system presented by the PolyU CBS-Comp Team at the Task 1 of SemEval 2021, where the goal was the estimation of the complexity of words in a given sentence context. Our top system, based on a combination of lexical, syntactic, word embeddings and Transformers-derived features and on a Gradient Boosting Regressor, achieves a top correlation score of 0.754 on the subtask 1 for single words and 0.659 on the subtask 2 for multiword expressions.

pdf bib
Looking for a Role for Word Embeddings in Eye-Tracking Features Prediction: Does Semantic Similarity Help?
Lavinia Salicchi | Alessandro Lenci | Emmanuele Chersoni
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

Eye-tracking psycholinguistic studies have suggested that context-word semantic coherence and predictability influence language processing during the reading activity. In this study, we investigate the correlation between the cosine similarities computed with word embedding models (both static and contextualized) and eye-tracking data from two naturalistic reading corpora. We also studied the correlations of surprisal scores computed with three state-of-the-art language models. Our results show strong correlation for the scores computed with BERT and GloVe, suggesting that similarity can play an important role in modeling reading times.

pdf bib
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge
Paolo Pedinotti | Giulia Rambelli | Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge). Given the recent success of Transformers Language Models (TLMs), we decided to test them on a benchmark for the dynamic estimation of thematic fit. The evaluation of these models was performed in comparison with SDM, a framework specifically designed to integrate events in sentence meaning representations, and we conducted a detailed error analysis to investigate which factors affect their behavior. Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge, and their predictions often depend on surface linguistic features, such as frequent words, collocations and syntactic patterns, thereby showing sub-optimal generalization abilities.

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Nora Hollenstein | Cassandra Jacobs | Yohei Oseki | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
CMCL 2021 Shared Task on Eye-Tracking Prediction
Nora Hollenstein | Emmanuele Chersoni | Cassandra L. Jacobs | Yohei Oseki | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Eye-tracking data from reading represent an important resource for both linguistics and natural language processing. The ability to accurately model gaze features is crucial to advance our understanding of language processing. This paper describes the Shared Task on Eye-Tracking Data Prediction, jointly organized with the eleventh edition of the Work- shop on Cognitive Modeling and Computational Linguistics (CMCL 2021). The goal of the task is to predict 5 different token- level eye-tracking metrics of the Zurich Cognitive Language Processing Corpus (ZuCo). Eye-tracking data were recorded during natural reading of English sentences. In total, we received submissions from 13 registered teams, whose systems include boosting algorithms with handcrafted features, neural models leveraging transformer language models, or hybrid approaches. The winning system used a range of linguistic and psychometric features in a gradient boosting framework.


pdf bib
Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation
Giulia Rambelli | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In linguistics and cognitive science, Logical metonymies are defined as type clashes between an event-selecting verb and an entity-denoting noun (e.g. The editor finished the article), which are typically interpreted by inferring a hidden event (e.g. reading) on the basis of contextual cues. This paper tackles the problem of logical metonymy interpretation, that is, the retrieval of the covert event via computational methods. We compare different types of models, including the probabilistic and the distributional ones previously introduced in the literature on the topic. For the first time, we also tested on this task some of the recent Transformer-based models, such as BERT, RoBERTa, XLNet, and GPT-2. Our results show a complex scenario, in which the best Transformer-based models and some traditional distributional models perform very similarly. However, the low performance on some of the testing datasets suggests that logical metonymy is still a challenging phenomenon for computational modeling.

pdf bib
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Michael Zock | Emmanuele Chersoni | Alessandro Lenci | Enrico Santus
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

pdf bib
The CogALex Shared Task on Monolingual and Multilingual Identification of Semantic Relations
Rong Xiang | Emmanuele Chersoni | Luca Iacoponi | Enrico Santus
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

The shared task of the CogALex-VI workshop focuses on the monolingual and multilingual identification of semantic relations. We provided training and validation data for the following languages: English, German and Chinese. Given a word pair, systems had to be trained to identify which relation holds between them, with possible choices being synonymy, antonymy, hypernymy and no relation at all. Two test sets were released for evaluating the participating systems. One containing pairs for each of the training languages (systems were evaluated in a monolingual fashion) and the other proposing a surprise language to test the crosslingual transfer capabilities of the systems. Among the submitted systems, top performance was achieved by a transformer-based model in both the monolingual and in the multilingual setting, for all the tested languages, proving the potentials of this recently-introduced neural architecture. The shared task description and the results are available at https://sites.google.com/site/cogalexvisharedtask/.

pdf bib
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
Emmanuele Chersoni | Barry Devereux | Chu-Ren Huang
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources

pdf bib
Using Conceptual Norms for Metaphor Detection
Mingyu Wan | Kathleen Ahrens | Emmanuele Chersoni | Menghan Jiang | Qi Su | Rong Xiang | Chu-Ren Huang
Proceedings of the Second Workshop on Figurative Language Processing

This paper reports a linguistically-enriched method of detecting token-level metaphors for the second shared task on Metaphor Detection. We participate in all four phases of competition with both datasets, i.e. Verbs and AllPOS on the VUA and the TOFEL datasets. We use the modality exclusivity and embodiment norms for constructing a conceptual representation of the nodes and the context. Our system obtains an F-score of 0.652 for the VUA Verbs track, which is 5% higher than the strong baselines. The experimental results across models and datasets indicate the salient contribution of using modality exclusivity and modality shift information for predicting metaphoricity.

pdf bib
Automatic Learning of Modality Exclusivity Norms with Crosslingual Word Embeddings
Emmanuele Chersoni | Rong Xiang | Qin Lu | Chu-Ren Huang
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Collecting modality exclusivity norms for lexical items has recently become a common practice in psycholinguistics and cognitive research. However, these norms are available only for a relatively small number of languages and often involve a costly and time-consuming collection of ratings. In this work, we aim at learning a mapping between word embeddings and modality norms. Our experiments focused on crosslingual word embeddings, in order to predict modality association scores by training on a high-resource language and testing on a low-resource one. We ran two experiments, one in a monolingual and the other one in a crosslingual setting. Results show that modality prediction using off-the-shelf crosslingual embeddings indeed has moderate-to-high correlations with human ratings even when regression algorithms are trained on an English resource and tested on a completely unseen language.

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Yohei Oseki | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the 12th Language Resources and Evaluation Conference

While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

pdf bib
Ciron: a New Benchmark Dataset for Chinese Irony Detection
Rong Xiang | Xuefeng Gao | Yunfei Long | Anran Li | Emmanuele Chersoni | Qin Lu | Chu-Ren Huang
Proceedings of the 12th Language Resources and Evaluation Conference

Automatic Chinese irony detection is a challenging task, and it has a strong impact on linguistic research. However, Chinese irony detection often lacks labeled benchmark datasets. In this paper, we introduce Ciron, the first Chinese benchmark dataset available for irony detection for machine learning models. Ciron includes more than 8.7K posts, collected from Weibo, a micro blogging platform. Most importantly, Ciron is collected with no pre-conditions to ensure a much wider coverage. Evaluation on seven different machine learning classifiers proves the usefulness of Ciron as an important resource for Chinese irony detection.


pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Alessandro Lenci | Tal Linzen | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Distributional Semantics Meets Construction Grammar. towards a Unified Usage-Based Model of Grammar and Meaning
Giulia Rambelli | Emmanuele Chersoni | Philippe Blache | Chu-Ren Huang | Alessandro Lenci
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose a new type of semantic representation of Construction Grammar that combines constructions with the vector representations used in Distributional Semantics. We introduce a new framework, Distributional Construction Grammar, where grammar and meaning are systematically modeled from language use, and finally, we discuss the kind of contributions that distributional models can provide to CxG representation from a linguistic and cognitive perspective.

pdf bib
PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training
Mingyu Wan | Rong Xiang | Emmanuele Chersoni | Natalia Klyueva | Kathleen Ahrens | Bin Miao | David Broadstock | Jian Kang | Amos Yung | Chu-Ren Huang
Proceedings of the First Workshop on Financial Technology and Natural Language Processing


pdf bib
A Rank-Based Similarity Metric for Word Embeddings
Enrico Santus | Hongmin Wang | Emmanuele Chersoni | Yue Zhang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Word Embeddings have recently imposed themselves as a standard for representing word meaning in NLP. Semantic similarity between word pairs has become the most common evaluation benchmark for these representations, with vector cosine being typically used as the only similarity metric. In this paper, we report experiments with a rank-based metric for WE, which performs comparably to vector cosine in similarity estimation and outperforms it in the recently-introduced and challenging task of outlier detection, thus suggesting that rank-based measures can improve clustering quality.

pdf bib
Modeling Violations of Selectional Restrictions with Distributional Semantics
Emmanuele Chersoni | Adrià Torrens Urrutia | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.

pdf bib
BomJi at SemEval-2018 Task 10: Combining Vector-, Pattern- and Graph-based Information to Identify Discriminative Attributes
Enrico Santus | Chris Biemann | Emmanuele Chersoni
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes BomJi, a supervised system for capturing discriminative attributes in word pairs (e.g. yellow as discriminative for banana over watermelon). The system relies on an XGB classifier trained on carefully engineered graph-, pattern- and word embedding-based features. It participated in the SemEval-2018 Task 10 on Capturing Discriminative Attributes, achieving an F1 score of 0.73 and ranking 2nd out of 26 participant systems.


pdf bib
Measuring Thematic Fit with Distributional Feature Overlap
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.

pdf bib
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Emmanuele Chersoni | Enrico Santus | Philippe Blache | Alessandro Lenci
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf bib
Logical Metonymy in a Distributional Model of Sentence Comprehension
Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

In theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013,2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task.


pdf bib
Towards a Distributional Model of Semantic Complexity
Emmanuele Chersoni | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).

pdf bib
CogALex-V Shared Task: ROOT18
Emmanuele Chersoni | Giulia Rambelli | Enrico Santus
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

In this paper, we describe ROOT 18, a classifier using the scores of several unsupervised distributional measures as features to discriminate between semantically related and unrelated words, and then to classify the related pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy, part-whole meronymy). Our classifier participated in the CogALex-V Shared Task, showing a solid performance on the first subtask, but a poor performance on the second subtask. The low scores reported on the second subtask suggest that distributional measures are not sufficient to discriminate between multiple semantic relations at once.

pdf bib
Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity
Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Testing APSyn against Vector Cosine on Similarity Estimation
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Chu-Ren Huang | Philippe Blache
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers