Marianna Apidianaki

2025

pdf bib
Proceedings of the 31st International Conference on Computational Linguistics
Owen Rambow | Leo Wanner | Marianna Apidianaki | Hend Al-Khalifa | Barbara Di Eugenio | Steven Schockaert
Proceedings of the 31st International Conference on Computational Linguistics

pdf bib abs
Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Milad Alshomary | Narutatsu Ri | Marianna Apidianaki | Ajay Patel | Smaranda Muresan | Kathleen McKeown
Proceedings of the 31st International Conference on Computational Linguistics

Recent state-of-the-art authorship attribution methods learn authorship representations of text in a latent, uninterpretable space, which hinders their usability in real-world applications. We propose a novel approach for interpreting learned embeddings by identifying representative points in the latent space and leveraging large language models to generate informative natural language descriptions of the writing style associated with each point. We evaluate the alignment between our interpretable and latent spaces and demonstrate superior prediction agreement over baseline methods. Additionally, we conduct a human evaluation to assess the quality of these style descriptions and validate their utility in explaining the latent space. Finally, we show that human performance on the challenging authorship attribution task improves by +20% on average when aided with explanations from our method.

pdf bib abs
GenAI Content Detection Task 3: Cross-Domain Machine Generated Text Detection Challenge
Liam Dugan | Andrew Zhu | Firoj Alam | Preslav Nakov | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

Recently there have been many shared tasks targeting the detection of generated text from Large Language Models (LLMs). However, these shared tasks tend to focus either on cases where text is limited to one particular domain or cases where text can be from many domains, some of which may not be seen during test time. In this shared task, using the newly released RAID benchmark, we aim to answer whether or not models can detect generated text from a large, yet fixed, number of domains and LLMs, all of which are seen during training. Over the course of three months, our task was attempted by 9 teams with 23 detector submissions. We find that multiple participants were able to obtain accuracies of over 99% on machine-generated text from RAID while maintaining a 5% False Positive Rate—suggesting that detectors are able to robustly detect text from many domains and models simultaneously. We discuss potential interpretations of this result and provide directions for future research.

pdf bib abs
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel | Jiacheng Zhu | Justin Qiu | Zachary Horvitz | Marianna Apidianaki | Kathleen McKeown | Chris Callison-Burch
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content. However, the contrastive triplets often used for training these representations may vary in both style and content, leading to potential content leakage in the representations. We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings. We use a large language model to create a synthetic dataset of near-exact paraphrases with controlled style variations, and produce positive and negative examples across 40 distinct style features for precise contrastive learning. We assess the quality of our synthetic data and embeddings through human and automatic evaluations. StyleDistance enhances the content-independence of style embeddings, which generalize to real-world benchmarks and outperform leading style representations in downstream applications.

2024

pdf bib abs
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu | Marianna Apidianaki | Chris Callison-Burch
Computational Linguistics, Volume 50, Issue 2 - June 2023

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, that is, an explanation should accurately represent the reasoning process behind the model’s prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity-based methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.

pdf bib abs
Language Learning, Representation, and Processing in Humans and Machines: Introduction to the Special Issue
Marianna Apidianaki | Abdellah Fourtassi | Sebastian Padó
Computational Linguistics, Volume 50, Issue 4 - December 2024

Large Language Models (LLMs) and humans acquire knowledge about language without direct supervision. LLMs do so by means of specific training objectives, while humans rely on sensory experience and social interaction. This parallelism has created a feeling in NLP and cognitive science that a systematic understanding of how LLMs acquire and use the encoded knowledge could provide useful insights for studying human cognition. Conversely, methods and findings from the field of cognitive science have occasionally inspired language model development. Yet, the differences in the way that language is processed by machines and humans—in terms of learning mechanisms, amounts of data used, grounding and access to different modalities—make a direct translation of insights challenging. The aim of this edited volume has been to create a forum of exchange and debate along this line of research, inviting contributions that further elucidate similarities and differences between humans and LLMs.

pdf bib abs
Adjusting Interpretable Dimensions in Embedding Space with Human Judgments
Katrin Erk | Marianna Apidianaki
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Embedding spaces contain interpretable dimensions indicating gender, formality in style, or even object properties. This has been observed multiple times. Such interpretable dimensions are becoming valuable tools in different areas of study, from social science to neuroscience. The standard way to compute these dimensions uses contrasting seed words and computes difference vectors over them. This is simple but does not always work well. We combine seed-based vectors with guidance from human ratings of where words fall along a specific dimension, and evaluate on predicting both object properties like size and danger, and the stylistic properties of formality and complexity. We obtain interpretable dimensions with markedly better performance especially in cases where seed-based dimensions do not work well.

This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The shared task was conducted with a newly constructed dataset of 4000 model outputs labeled by 5 annotators each, spanning 3 NLP tasks: machine translation, paraphrase generation and definition modeling.The shared task was tackled by a total of 58 different users grouped in 42 teams, out of which 26 elected to write a system description paper; collectively, they submitted over 300 prediction sets on both tracks of the shared task. We observe a number of key trends in how this approach was tackled—many participants rely on a handful of model, and often rely either on synthetic data for fine-tuning or zero-shot prompting strategies. While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items.

2023

pdf bib abs
Explanation-based Finetuning Makes Models More Robust to Spurious Cues
Josh Magnus Ludan | Yixuan Meng | Tai Nguyen | Saurabh Shah | Qing Lyu | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task, leading to poor generalization on out-of-distribution data. We propose explanation-based finetuning as a general approach to mitigate LLMs’ reliance on spurious correlations. Unlike standard finetuning where the model only predicts the answer given the input, we finetune the model to additionally generate a free-text explanation supporting its answer. To evaluate our method, we finetune the model on artificially constructed training sets containing different types of spurious cues, and test it on a test set without these cues. Compared to standard finetuning, our method makes GPT-3 (davinci) remarkably more robust against spurious cues in terms of accuracy drop across four classification tasks: ComVE (+1.2), CREAK (+9.1), e-SNLI (+15.4), and SBIC (+6.5). The efficacy generalizes across multiple model families and scales, with greater gains for larger models. Finally, our method also works well with explanations generated by the model, implying its applicability to more datasets without human-written explanations.

pdf bib abs
From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation
Marianna Apidianaki
Computational Linguistics, Volume 49, Issue 2 - June 2023

Vector-based word representation paradigms situate lexical meaning at different levels of abstraction. Distributional and static embedding models generate a single vector per word type, which is an aggregate across the instances of the word in a corpus. Contextual language models, on the contrary, directly capture the meaning of individual word instances. The goal of this survey is to provide an overview of word meaning representation methods, and of the strategies that have been proposed for improving the quality of the generated vectors. These often involve injecting external knowledge about lexical semantic relationships, or refining the vectors to describe different senses. The survey also covers recent approaches for obtaining word type-level representations from token-level ones, and for combining static and contextualized representations. Special focus is given to probing and interpretation studies aimed at discovering the lexical semantic knowledge that is encoded in contextualized representations. The challenges posed by this exploration have motivated the interest towards static embedding derivation from contextualized embeddings, and for methods aimed at improving the similarity estimates that can be drawn from the space of contextual language models.

Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, such as DALL⋅E 2, since it requires the ability to model implicit meaning and compositionality. We propose to solve the task through the collaboration between Large Language Models (LLMs) and Diffusion Models: Instruct GPT-3 (davinci-002) with Chain-of-Thought prompting generates text that represents a visual elaboration of the linguistic metaphor containing the implicit meaning and relevant objects, which is then used as input to the diffusion-based text-to-image models. Using a human-AI collaboration framework, where humans interact both with the LLM and the top-performing diffusion model, we create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations. Evaluation by professional illustrators shows the promise of LLM-Diffusion Model collaboration for this task.To evaluate the utility of our Human-AI collaboration framework and the quality of our dataset, we perform both an intrinsic human-based evaluation and an extrinsic evaluation using visual entailment as a downstream task.

pdf bib
Faithful Chain-of-Thought Reasoning
Qing Lyu | Shreya Havaldar | Adam Stein | Li Zhang | Delip Rao | Eric Wong | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib abs
Representation of Lexical Stylistic Features in Language Models’ Embedding Space
Qing Lyu | Marianna Apidianaki | Chris Callison-burch
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

The representation space of pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we demonstrate that lexical stylistic notions such as complexity, formality, and figurativeness, can also be identified in this space. We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs. Using these vectors, we can characterize new texts in terms of these dimensions by performing simple calculations in the corresponding embedding space. We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases, whereas contextualized LMs perform better on sentences. The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space, which can be corrected to some extent using techniques like standardization.

2022

pdf bib
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Eneko Agirre | Marianna Apidianaki | Ivan Vulić
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

pdf bib abs
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
Yue Yang | Artemis Panagopoulou | Marianna Apidianaki | Mark Yatskar | Chris Callison-Burch
Findings of the Association for Computational Linguistics: EMNLP 2022

Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts.We hypothesize this to mainly be the case for perceptual properties which are obvious to the participants in the communication. We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models. We consider perceptual properties to be more concrete than abstract properties (e.g., interesting, flawless). We propose to use the adjectives’ concreteness score as a lever to calibrate the contribution of each source (text vs. images). We evaluate our ensemble model in a ranking task where the actual properties of a noun need to be ranked higher than other non-relevant properties. Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.

pdf bib abs
Is “My Favorite New Movie” My Favorite Movie? Probing the Understanding of Recursive Noun Phrases
Qing Lyu | Zheng Hua | Daoxin Li | Li Zhang | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recursive noun phrases (NPs) have interesting semantic properties. For example, “my favorite new movie” is not necessarily my favorite movie, whereas “my new favorite movie” is. This is common sense to humans, yet it is unknown whether language models have such knowledge. We introduce the Recursive Noun Phrase Challenge (RNPC), a dataset of three textual inference tasks involving textual entailment and event plausibility comparison, precisely targeting the understanding of recursive NPs. When evaluated on RNPC, state-of-the-art Transformer models only perform around chance. Still, we show that such knowledge is learnable with appropriate data. We further probe the models for relevant linguistic features that can be learned from our tasks, including modifier semantic category and modifier scope. Finally, models trained on RNPC achieve strong zero-shot performance on an extrinsic Harm Detection evaluation task, showing the usefulness of the understanding of recursive NPs in downstream applications.

pdf bib abs
How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Aarne Talman | Marianna Apidianaki | Stergios Chatzikyriakidis | Jörg Tiedemann
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

A central question in natural language understanding (NLU) research is whether high performance demonstrates the models’ strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. These involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high on most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that they leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used to assess the extent to which a specific dataset constitutes a proper testbed for evaluating models’ language understanding capabilities.

2021

pdf bib abs
ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns’ Semantic Properties and their Prototypicality
Marianna Apidianaki | Aina Garí Soler
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Large scale language models encode rich commonsense knowledge acquired through exposure to massive data during pre-training, but their understanding of entities and their semantic properties is unclear. We probe BERT (Devlin et al., 2019) for the properties of English nouns as expressed by adjectives that do not restrict the reference scope of the noun they modify (as in “red car”), but instead emphasise some inherent aspect (“red strawberry”). We base our study on psycholinguistics datasets that capture the association strength between nouns and their semantic features. We probe BERT using cloze tasks and in a classification setting, and show that the model has marginal knowledge of these features and their prevalence as expressed in these datasets. We discuss factors that make evaluation challenging and impede drawing general conclusions about the models’ knowledge of noun properties. Finally, we show that when tested in a fine-tuning setting addressing entailment, BERT successfully leverages the information needed for reasoning about the meaning of adjective-noun constructions outperforming previous methods.

pdf bib
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Eneko Agirre | Marianna Apidianaki | Ivan Vulić
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

pdf bib abs
Scalar Adjective Identification and Multilingual Ranking
Aina Garí Soler | Marianna Apidianaki
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The intensity relationship that holds between scalar adjectives (e.g., nice < great < wonderful) is highly relevant for natural language inference and common-sense reasoning. Previous research on scalar adjective ranking has focused on English, mainly due to the availability of datasets for evaluation. We introduce a new multilingual dataset in order to promote research on scalar adjectives in new languages. We perform a series of experiments and set performance baselines on this dataset, using monolingual and multilingual contextual language models. Additionally, we introduce a new binary classification task for English scalar adjective identification which examines the models’ ability to distinguish scalar from relational adjectives. We probe contextualised representations and report baseline results for future comparison on this task.

pdf bib abs
NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance
Aarne Talman | Marianna Apidianaki | Stergios Chatzikyriakidis | Jörg Tiedemann
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models’ meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models’ reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

pdf bib abs
Let’s Play Mono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses
Aina Garí Soler | Marianna Apidianaki
Transactions of the Association for Computational Linguistics, Volume 9

Pre-trained language models (LMs) encode rich information about linguistic structure but their knowledge about lexical polysemy remains unclear. We propose a novel experimental setup for analyzing this knowledge in LMs specifically trained for different languages (English, French, Spanish, and Greek) and in multilingual BERT. We perform our analysis on datasets carefully designed to reflect different sense distributions, and control for parameters that are highly correlated with polysemy such as frequency and grammatical category. We demonstrate that BERT-derived representations reflect words’ polysemy level and their partitionability into senses. Polysemy-related information is more clearly present in English BERT embeddings, but models in other languages also manage to establish relevant distinctions between words at different polysemy levels. Our results contribute to a better understanding of the knowledge encoded in contextualized representations and open up new avenues for multilingual lexical semantics research.

2020

pdf bib abs
Controlling the Imprint of Passivization and Negation in Contextualized Representations
Hande Celikkanat | Sami Virpioja | Jörg Tiedemann | Marianna Apidianaki
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Contextualized word representations encode rich information about syntax and semantics, alongside specificities of each context of use. While contextual variation does not always reflect actual meaning shifts, it can still reduce the similarity of embeddings for word instances having the same meaning. We explore the imprint of two specific linguistic alternations, namely passivization and negation, on the representations generated by neural models trained with two different objectives: masked language modeling and translation. Our exploration methodology is inspired by an approach previously proposed for removing societal biases from word vectors. We show that passivization and negation leave their traces on the representations, and that neutralizing this information leads to more similar embeddings for words that should preserve their meaning in the transformation. We also find clear differences in how the respective features generalize across datasets.

pdf bib
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Eneko Agirre | Marianna Apidianaki | Ivan Vulić
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

pdf bib abs
BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations
Aina Garí Soler | Marianna Apidianaki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Adjectives like pretty, beautiful and gorgeous describe positive properties of the nouns they modify but with different intensity. These differences are important for natural language understanding and reasoning. We propose a novel BERT-based approach to intensity detection for scalar adjectives. We model intensity by vectors directly derived from contextualised representations and show they can successfully rank scalar adjectives. We evaluate our models both intrinsically, on gold standard datasets, and on an Indirect Question Answering task. Our results demonstrate that BERT encodes rich knowledge about the semantics of scalar adjectives, and is able to provide better quality intensity rankings than static embeddings and previous models with access to dedicated resources.

pdf bib abs
MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning
Aina Garí Soler | Marianna Apidianaki
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded Word Similarity in Context (GWSC). We experiment with injecting semantic knowledge into pre-trained BERT models through fine-tuning on lexical semantic tasks related to GWSC. We use existing semantically annotated datasets, and propose to approximate similarity through automatically generated lexical substitutes in context. We participate in both GWSC subtasks and address two languages, English and Finnish. Our best English models occupy the third and fourth positions in the ranking for the two subtasks. Performance is lower for the Finnish models which are mid-ranked in the respective subtasks, highlighting the important role of data availability for fine-tuning.

pdf bib
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics
Iryna Gurevych | Marianna Apidianaki | Manaal Faruqui
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

2019

pdf bib abs
SUM-QE: a BERT-based Summary Quality Estimation Model
Stratos Xenouleas | Prodromos Malakasiotis | Marianna Apidianaki | Ion Androutsopoulos
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose SUM-QE, a novel Quality Estimation model for summarization based on BERT. The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references. SUM-QE achieves very high correlations with human ratings, outperforming simpler models addressing these linguistic aspects. Predictions of the SUM-QE model can be used for system development, and to inform users of the quality of automatically produced summaries and other types of generated text.

pdf bib abs
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
Reno Kriz | João Sedoc | Marianna Apidianaki | Carolina Zheng | Gaurav Kumar | Eleni Miltsakaki | Chris Callison-Burch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.

pdf bib abs
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity. The best performing models outperform previous methods in both settings.

pdf bib abs
A Comparison of Context-sensitive Models for Lexical Substitution
Aina Garí Soler | Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Word embedding representations provide good estimates of word meaning and give state-of-the art performance in semantic tasks. Embedding approaches differ as to whether and how they account for the context surrounding a word. We present a comparison of different word and context representations on the task of proposing substitutes for a target word in context (lexical substitution). We also experiment with tuning contextualized word embeddings on a dataset of sense-specific instances for each target word. We show that powerful contextualized word representations, which give high performance in several semantics-related tasks, deal less well with the subtle in-context similarity relationships needed for substitution. This is better handled by models trained with this objective in mind, where the inter-dependence between word and context representations is explicitly modeled during training.

pdf bib abs
Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors
Sotiris Kotitsas | Dimitris Pappas | Ion Androutsopoulos | Ryan McDonald | Marianna Apidianaki
Proceedings of the 18th BioNLP Workshop and Shared Task

Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.

pdf bib
LIMSI-MULTISEM at the IJCAI SemDeep-5 WiC Challenge: Context Representations for Word Usage Similarity Estimation
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

2018

pdf bib abs
A comparative study of word embeddings and other features for lexical complexity detection in French
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.

pdf bib abs
Learning Scalar Adjective Intensity from Paraphrases
Anne Cocos | Skyler Wharton | Ellie Pavlick | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Adjectives like “warm”, “hot”, and “scalding” all describe temperature but differ in intensity. Understanding these differences between adjectives is a necessary part of reasoning about natural language. We propose a new paraphrase-based method to automatically learn the relative intensity relation that holds between a pair of scalar adjectives. Our approach analyzes over 36k adjectival pairs from the Paraphrase Database under the assumption that, for example, paraphrase pair “really hot” <–> “scalding” suggests that “hot” < “scalding”. We show that combining this paraphrase evidence with existing, complementary pattern- and lexicon-based approaches improves the quality of systems for automatically ordering sets of scalar adjectives and inferring the polarity of indirect answers to “yes/no” questions.

pdf bib abs
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package
Ajay Patel | Alexander Sands | Chris Callison-Burch | Marianna Apidianaki
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Vector space embedding models like word2vec, GloVe, and fastText are extremely popular representations in natural language processing (NLP) applications. We present Magnitude, a fast, lightweight tool for utilizing and processing embeddings. Magnitude is an open source Python package with a compact vector storage file format that allows for efficient manipulation of huge numbers of embeddings. Magnitude performs common operations up to 60 to 6,000 times faster than Gensim. Magnitude introduces several novel features for improved robustness like out-of-vocabulary lookups.

pdf bib abs
Simplification Using Paraphrases and Context-Based Lexical Substitution
Reno Kriz | Eleni Miltsakaki | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Lexical simplification involves identifying complex words or phrases that need to be simplified, and recommending simpler meaning-preserving substitutes that can be more easily understood. We propose a complex word identification (CWI) model that exploits both lexical and contextual features, and a simplification mechanism which relies on a word-embedding lexical substitution model to replace the detected complex words with simpler paraphrases. We compare our CWI and lexical simplification models to several baselines, and evaluate the performance of our simplification system against human judgments. The results show that our models are able to detect complex words with higher accuracy than other commonly used methods, and propose good simplification substitutes in context. They also highlight the limited contribution of context features for CWI, which nonetheless improve simplification compared to context-unaware models.

pdf bib abs
Comparing Constraints for Taxonomic Organization
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Building a taxonomy from the ground up involves several sub-tasks: selecting terms to include, predicting semantic relations between terms, and selecting a subset of relational instances to keep, given constraints on the taxonomy graph. Methods for this final step – taxonomic organization – vary both in terms of the constraints they impose, and whether they enable discovery of synonymous terms. It is hard to isolate the impact of these factors on the quality of the resulting taxonomy because organization methods are rarely compared directly. In this paper, we present a head-to-head comparison of six taxonomic organization algorithms that vary with respect to their structural and transitivity constraints, and treatment of synonymy. We find that while transitive algorithms out-perform their non-transitive counterparts, the top-performing transitive algorithm is prohibitively slow for taxonomies with as few as 50 entities. We propose a simple modification to a non-transitive optimum branching algorithm to explicitly incorporate synonymy, resulting in a method that is substantially faster than the best transitive algorithm while giving complementary performance.

pdf bib abs
Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation
Marianna Apidianaki | Guillaume Wisniewski | Anne Cocos | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions. The original HyTER metric relied on hand-crafted paraphrase networks which restricted its applicability to new data. We test, for the first time, HyTER with automatically built paraphrase lattices. We show that although the metric obtains good results on small and carefully curated data with both manually and automatically selected substitutes, it achieves medium performance on much larger and noisier datasets, demonstrating the limits of the metric for tuning and evaluation of current MT systems.

2017

Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both high and low resource languages.

Semantic relation knowledge is crucial for natural language understanding. We introduce “KnowYourNyms?”, a web-based game for learning semantic relations. While providing users with an engaging experience, the application collects large amounts of data that can be used to improve semantic relation classifiers. The data also broadly informs us of how people perceive the relationships between words, providing useful insights for research in psychology and linguistics.

pdf bib abs
Learning Antonyms with Paraphrases and a Morphology-Aware Neural Network
Sneha Rajana | Chris Callison-Burch | Marianna Apidianaki | Vered Shwartz
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Recognizing and distinguishing antonyms from other types of semantic relations is an essential part of language understanding systems. In this paper, we present a novel method for deriving antonym pairs using paraphrase pairs containing negation markers. We further propose a neural network model, AntNET, that integrates morphological features indicative of antonymy into a path-based relation detection algorithm. We demonstrate that our model outperforms state-of-the-art models in distinguishing antonyms from other semantic relations and is capable of efficiently handling multi-word expressions.

pdf bib abs
Mapping the Paraphrase Database to WordNet
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

WordNet has facilitated important research in natural language processing but its usefulness is somewhat limited by its relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 times more words, but lacks the semantic structure of WordNet that would make it more directly useful for downstream tasks. We present a method for mapping words from PPDB to WordNet synsets with 89% accuracy. The mapping also lays important groundwork for incorporating WordNet’s relations into PPDB so as to increase its utility for semantic reasoning in applications.

pdf bib
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Steven Bethard | Marine Carpuat | Marianna Apidianaki | Saif M. Mohammad | Daniel Cer | David Jurgens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

pdf bib abs
Word Sense Filtering Improves Embedding-Based Lexical Substitution
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

The role of word sense disambiguation in lexical substitution has been questioned due to the high performance of vector space models which propose good substitutes without explicitly accounting for sense. We show that a filtering mechanism based on a sense inventory optimized for substitutability can improve the results of these models. Our sense inventory is constructed using a clustering method which generates paraphrase clusters that are congruent with lexical substitution annotations in a development set. The results show that lexical substitution can still benefit from senses which can improve the output of vector space paraphrase ranking models.

2016

pdf bib abs
Lecture bilingue augmentée par des alignements multi-niveaux (Augmenting bilingual reading with alignment information)
François Yvon | Yong Xu | Marianna Apidianaki | Clément Pillias | Cubaud Pierre
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

Le travail qui a conduit à cette démonstration combine des outils de traitement des langues multilingues, en particulier l’alignement automatique, avec des techniques de visualisation et d’interaction. Il vise à proposer des pistes pour le développement d’outils permettant de lire simultanément les différentes versions d’un texte disponible en plusieurs langues, avec des applications en lecture de loisir ou en lecture professionnelle.

pdf bib
Vector-space models for PPDB paraphrase ranking in context
Marianna Apidianaki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Word Sense Clustering and Clusterability
Diana McCarthy | Marianna Apidianaki | Katrin Erk
Computational Linguistics, Volume 42, Issue 2 - June 2016

pdf bib abs
Datasets for Aspect-Based Sentiment Analysis in French
Marianna Apidianaki | Xavier Tannier | Cécile Richart
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Aspect Based Sentiment Analysis (ABSA) is the task of mining and summarizing opinions from text about specific entities and their aspects. This article describes two datasets for the development and testing of ABSA systems for French which comprise user reviews annotated with relevant entities, aspects and polarity values. The first dataset contains 457 restaurant reviews (2365 sentences) for training and testing ABSA systems, while the second contains 162 museum reviews (655 sentences) dedicated to out-of-domain evaluation. Both datasets were built as part of SemEval-2016 Task 5 “Aspect-Based Sentiment Analysis” where seven different languages were represented, and are publicly available for research purposes.

pdf bib
TransRead: Designing a Bilingual Reading Experience with Machine Translation Technologies
François Yvon | Yong Xu | Marianna Apidianaki | Clément Pillias | Pierre Cubaud
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Proceedings of ACL-2016 System Demonstrations
Sameer Pradhan | Marianna Apidianaki
Proceedings of ACL-2016 System Demonstrations

2015

pdf bib
LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking
Marianna Apidianaki | Li Gong
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
METEOR-WSD: Improved Sense Matching in MT Evaluation
Marianna Apidianaki | Benjamin Marie
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Alignment-based sense selection in METEOR and the RATATOUILLE recipe
Benjamin Marie | Marianna Apidianaki
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Global Methods for Cross-lingual Semantic Role and Predicate Labelling
Lonneke van der Plas | Marianna Apidianaki | Chenhua Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Cross-lingual Word Sense Disambiguation for Predicate Labelling of French
Lonneke van der Plas | Marianna Apidianaki
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib abs
Semantic Clustering of Pivot Paraphrases
Marianna Apidianaki | Emilia Verzeni | Diana McCarthy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Paraphrases extracted from parallel corpora by the pivot method (Bannard and Callison-Burch, 2005) constitute a valuable resource for multilingual NLP applications. In this study, we analyse the semantics of unigram pivot paraphrases and use a graph-based sense induction approach to unveil hidden sense distinctions in the paraphrase sets. The comparison of the acquired senses to gold data from the Lexical Substitution shared task (McCarthy and Navigli, 2007) demonstrates that sense distinctions exist in the paraphrase sets and highlights the need for a disambiguation step in applications using this resource.

2013

pdf bib
LIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering
Marianna Apidianaki
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Cross-lingual WSD for Translation Extraction from Comparable Corpora
Marianna Apidianaki | Nikola Ljubešić | Darja Fišer
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

2012

pdf bib
Measuring the Adequacy of Cross-Lingual Paraphrases in a Machine Translation Setting
Marianna Apidianaki
Proceedings of COLING 2012: Posters

pdf bib abs
Applying cross-lingual WSD to wordnet development
Marianna Apidianaki | Benoît Sagot
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The automatic development of semantic resources constitutes an important challenge in the NLP community. The methods used generally exploit existing large-scale resources, such as Princeton WordNet, often combined with information extracted from multilingual resources and parallel corpora. In this paper we show how Cross-Lingual Word Sense Disambiguation can be applied to wordnet development. We apply the proposed method to WOLF, a free wordnet for French still under construction, in order to fill synsets that did not contain any literal yet and increase its coverage.

pdf bib abs
Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations
Kata Gábor | Marianna Apidianaki | Benoît Sagot | Éric Villemonte de La Clergerie
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this article, we present a distributional analysis method for extracting nominalization relations from monolingual corpora. The acquisition method makes use of distributional and morphological information to select nominalization candidates. We explain how the learning is performed on a dependency annotated corpus and describe the nominalization results. Furthermore, we show how these results served to enrich an existing lexical resource, the WOLF (Wordnet Libre du FrancÂ¸ais). We present the techniques that we developed in order to integrate the new information into WOLF, based on both its structure and content. Finally, we evaluate the validity of the automatically obtained information and the correctness of its integration into the semantic resource. The method proved to be useful for boosting the coverage of WOLF and presents the advantage of filling verbal synsets, which are particularly difficult to handle due to the high level of verbal polysemy.

pdf bib
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
Marianna Apidianaki | Ido Dagan | Jennifer Foster | Yuval Marton | Djamé Seddah | Reut Tsarfaty
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf bib
WSD for n-best reranking and local language modeling in SMT
Marianna Apidianaki | Guillaume Wisniewski | Artem Sokolov | Aurélien Max | François Yvon
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

L’étape de la désambiguïsation lexicale est souvent esquivée dans les systèmes de Traduction Automatique Statistique (Statistical Machine Translation (SMT)) car considérée comme non nécessaire à la sélection de traductions correctes. Le débat autour de cette nécessité est actuellement assez vif. Dans cet article, nous présentons les principales positions sur le sujet. Nous analysons les avantages et les inconvénients de la conception actuelle de la désambiguïsation dans le cadre de la SMT, d’après laquelle les sens des mots correspondent à leurs traductions dans des corpus parallèles. Ensuite, nous présentons des arguments en faveur d’une analyse plus poussée des informations sémantiques induites à partir de corpus parallèles et nous expliquons comment les résultats d’une telle analyse pourraient être exploités pour une évaluation plus flexible et concluante de l’impact de la désambiguïsation dans la SMT.

pdf bib
Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation
Marianna Apidianaki
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Capturing Lexical Variation in MT Evaluation Using Automatically Built Sense-Cluster Inventories
Marianna Apidianaki | Yifan He | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2008

pdf bib abs
Translation-oriented Word Sense Induction Based on Parallel Corpora
Marianna Apidianaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Word Sense Disambiguation (WSD) is an intermediate task that serves as a means to an end defined by the application in which it is to be used. However, different applications have varying disambiguation needs which should have an impact on the choice of the method and of the sense inventory used. The tendency towards application-oriented WSD becomes more and more evident, mostly because of the inadequacy of predefined sense inventories and the inefficacy of application-independent methods in accomplishing specific tasks. In this article, we present a data-driven method of sense induction, which combines contextual and translation information coming from a bilingual parallel training corpus. It consists of an unsupervised method that clusters semantically similar translation equivalents of source language (SL) polysemous words. The created clusters are projected on the SL words revealing their sense distinctions. Clustered equivalents describing a sense of a polysemous word can be considered as more or less commutable translations for an instance of the word carrying this sense. The resulting sense clusters can thus be used for WSD and sense annotation, as well as for lexical selection in translation applications.

2007

pdf bib abs
Repérage de sens et désambiguïsation dans un contexte bilingue
Marianna Apidianaki
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les besoins de désambiguïsation varient dans les différentes applications du Traitement Automatique des Langues (TAL). Dans cet article, nous proposons une méthode de désambiguïsation lexicale opératoire dans un contexte bilingue et, par conséquent, adéquate pour la désambiguïsation au sein d’applications relatives à la traduction. Il s’agit d’une méthode contextuelle, qui combine des informations de cooccurrence avec des informations traductionnelles venant d’un bitexte. L’objectif est l’établissement de correspondances de traduction au niveau sémantique entre les mots de deux langues. Cette méthode étend les conséquences de l’hypothèse contextuelle du sens dans un contexte bilingue, tout en admettant l’existence d’une relation de similarité sémantique entre les mots de deux langues en relation de traduction. La modélisation de ces correspondances de granularité fine permet la désambiguïsation lexicale de nouvelles occurrences des mots polysémiques de la langue source ainsi que la prédiction de la traduction la plus adéquate pour ces occurrences.

2006

pdf bib abs
Traitement de la polysémie lexicale dans un but de traduction
Marianna Apidianaki
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La désambiguïsation lexicale a une place centrale dans les applications de Traitement Automatique des Langues relatives à la traduction. Le travail présenté ici fait partie d’une étude sur les recouvrements et les divergences entre les espaces sémantiques occupés par des unités polysémiques de deux langues. Les correspondances entre ces unités sont rarement biunivoques et l’étude de ces correspondances aide à tirer des conclusions sur les possibilités et les limites d’utilisation d’une autre langue pour la désambiguïsation des unités d’une langue source. Le but de ce travail est l’établissement de correspondances d’une granularité optimale entre les unités de deux langues entretenant des relations de traduction. Ces correspondances seraient utilisables pour la prédiction des équivalents de traduction les plus adéquats de nouvelles occurrences des éléments polysémiques.