Diego Frassinelli - ACL Anthology

Diego Frassinelli

2025

GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German
Justin Hofenbitzer | Sebastian Schöning | Sebastian Belle | Jacqueline Lammert | Luise Modersohn | Martin Boeker | Diego Frassinelli
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Due to strict privacy regulations, text corpora in non-English clinical contexts are scarce. Consequently, synthetic data generation using Large Language Models (LLMs) emerges as a promising strategy to address this data gap. To evaluate the ability of LLMs in generating synthetic data, we applied them to our novel German Medical Interview Questions Corpus (GerMedIQ), which consists of 4,524 unique, simulated question-response pairs in German. We augmented our corpus by prompting 18 different LLMs to generate responses to the same questions. Structural and semantic evaluations of the generated responses revealed that large-sized language models produced responses comparable to those provided by humans. Additionally, an LLM-as-a-judge study, combined with a human baseline experiment assessing response acceptability, demonstrated that human raters preferred the responses generated by Mistral (124B) over those produced by humans. Nonetheless, our findings indicate that using LLMs for data augmentation in non-English clinical contexts requires caution.

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
Andreas Säuberli | Diego Frassinelli | Barbara Plank
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Knowing how test takers answer items in educational assessments is essential for test development, to evaluate item quality, and to improve test validity. However, this process usually requires extensive pilot studies with human participants. If large language models (LLMs) exhibit human-like response behavior to test items, this could open up the possibility of using them as pilot participants to accelerate test development. In this paper, we evaluate the human-likeness or psychometric plausibility of responses from 18 instruction-tuned LLMs with two publicly available datasets of multiple-choice test items across three subjects: reading, U.S. history, and economics. Our methodology builds on two theoretical frameworks from psychometrics which are commonly used in educational assessment, classical test theory and item response theory. The results show that while larger models are excessively confident, their response distributions can be more human-like when calibrated with temperature scaling. In addition, we find that LLMs tend to correlate better with humans in reading comprehension items compared to other subjects. However, the correlations are not very strong overall, indicating that LLMs should not be used for piloting educational assessments in a zero-shot setting.

AbsVis – Benchmarking How Humans and Vision-Language Models “See” Abstract Concepts in Images
Tarun Tater | Diego Frassinelli | Sabine Schulte im Walde
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Abstract concepts like mercy and peace often lack clear visual grounding, and thus challenge humans and models to provide suitable image representations. To address this challenge, we introduce AbsVis – a dataset of 675 images annotated with 14,175 concept–explanation attributions from humans and two Vision-Language Models (VLMs: Qwen and LLaVA), where each concept is accompanied by a textual explanation. We compare human and VLM attributions in terms of diversity, abstractness, and alignment, and find that humans attribute more varied concepts. AbsVis also includes 2,680 human preference judgments evaluating the quality of a subset of these annotations, showing that overlapping concepts (attributed by both humans and VLMs) are most preferred. Explanations clarify and strengthen the perceived attributions, both from humans and VLMs. Explanations clarify and strengthen the perceived attributions, both from human and VLMs. Finally, we show that VLMs can approximate human preferences and use them to fine-tune VLMs via Direct Preference Optimization (DPO), yielding improved alignments with preferred concept–explanation pairs.

Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze
Özge Alacam | Sanne Hoeken | Andreas Säuberli | Hannes Gröner | Diego Frassinelli | Sina Zarrieß | Barbara Plank
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Variation is inherent in opinion-based annotation tasks like sentiment or hate speech analysis. It does not only arise from errors, fatigue, or sentence ambiguity but also from genuine differences in opinion shaped by background, experience, and culture. In this paper, first, we show how annotators’ confidence ratings can be great use for disentangling subjective variation from uncertainty, without relying on specific features present in the data (text, gaze, etc.). Our goal is to establish distinctive dimensions of variation which are often not clearly separated in existing work on modeling annotator variation. We illustrate our approach through a hate speech detection task, demonstrating that models are affected differently by instances of uncertainty and subjectivity. In addition, we show that human gaze patterns offer valuable indicators of subjective evaluation and uncertainty. Disclaimer: This paper contains sentences that may be offensive.

A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds
Sinan Kurtyigit | Diego Frassinelli | Carina Silberer | Sabine Schulte Im Walde
Findings of the Association for Computational Linguistics: ACL 2025

We explore the role of the visual modality and of vision transformers in predicting the compositionality of English noun compounds. Crucially, we contribute a framework to address the challenge of obtaining adequate images that represent non-compositional compounds (such as “couch potato”), making it relevant for any image-based approach targeting figurative language. Our method uses prompting strategies and diffusion models to generate images. Comparing and combining our approach with a state-of-the-art text-based approach reveals complementary contributions regarding features as well as degrees of abstractness in compounds.

Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora
Robert Litschko | Verena Blaschke | Diana Burkhardt | Barbara Plank | Diego Frassinelli
Findings of the Association for Computational Linguistics: EMNLP 2025

Dialects exhibit a substantial degree of variation due to the lack of a standard orthography. At the same time, the ability of Large Language Models (LLMs) to process dialects remains largely understudied. To address this gap, we use Bavarian as a case study and investigate the lexical dialect understanding capability of LLMs by examining how well they recognize and translate dialectal terms across different parts-of-speech. To this end, we introduce DiaLemma, a novel annotation framework for creating dialect variation dictionaries from monolingual data only, and use it to compile a ground truth dataset consisting of 100K human-annotated German-Bavarian word pairs. We evaluate how well nine state-of-the-art LLMs can judge Bavarian terms as dialect translations, inflected variants, or unrelated forms of a given German lemma. Our results show that LLMs perform best on nouns and lexically similar word pairs, and struggle most in distinguishing between direct translations and inflected variants. Interestingly, providing additional context in the form of example usages improves the translation performance, but reduces their ability to recognize dialect variants. This study highlights the limitations of LLMs in dealing with orthographic dialect variation and emphasizes the need for future work on adapting LLMs to dialects.

The Difficult Case of Intended and Perceived Sarcasm: a Challenge for Humans and Large Language Models
Hyewon Jang | Diego Frassinelli
Proceedings of the 16th International Conference on Computational Semantics

We examine the cases of failed communication in sarcasm, defined as ‘the discrepancy between what speakers and observers perceive as sarcasm’. We identify factors that are associated with such failures, and how those difficult instances affect the detection performance of encoder-only and decoder-only generative models. We find that speakers’ incongruity between their felt annoyance and sarcasm in their utterance is highly correlated with sarcasm that fails to be communicated to human observers. This factor also relates to the drop of classification performance of large language models (LLMs). Additionally, disagreement among multiple observers about sarcasm is correlated with poorer performance of LLMs. Finally, we find that generative models produce better results with ground-truth labels from speakers than from observers, in contrast to encoder-only models, which suggests a general tendency by generative models to identify with speakers’ perspective by default.

Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
Henrike Beyer | Diego Frassinelli
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Most studies on the linguistic information encoded by BERT primarily focus on English. Our study examines a monolingual German BERT model using a semantic classification task on newspaper articles, analysing the linguistic features influencing classification decisions through SHAP values. We use the TüBa-D/Z corpus, a resource with gold-standard annotations for a set of linguistic features, including POS, inflectional morphology, phrasal, clausal, and dependency structures. Semantic features of nouns are evaluated via the GermaNet ontology using shared hypernyms. Our results indicate that the features identified in English also affect classification in German but suggests important language- and task-specific features as well.

LeWiDi-2025 at NLPerspectives: Third Edition of the Learning with Disagreements Shared Task
Elisa Leonardelli | Silvia Casola | Siyao Peng | Giulia Rizzi | Valerio Basile | Elisabetta Fersini | Diego Frassinelli | Hyewon Jang | Maja Pavlovic | Barbara Plank | Massimo Poesio
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

Many researchers have reached the conclusion that ai models should be trained to be aware of the possibility of variation and disagreement in human judgments, and evaluated as per their ability to recognize such variation. The LeWiDi series of shared tasks on Learning With Disagreements was established to promote this approach to training and evaluating ai models, by making suitable datasets more accessible and by developing evaluation methods. The third edition of the task builds on this goal by extending the LeWiDi benchmark to four datasets spanning paraphrase identification, irony detection, sarcasm detection, and natural language inference, with labeling schemes that include not only categorical judgments as in previous editions, but ordinal judgments as well. Another novelty is that we adopt two complementary paradigms to evaluate disagreement-aware systems: the soft-label approach, in which models predict population-level distributions of judgments, and the perspectivist approach, in which models predict the interpretations of individual annotators. Crucially, we moved beyond standard metrics such as cross-entropy, and tested new evaluation metrics for the two paradigms. The task attracted diverse participation, and the results provide insights into the strengths and limitations of methods to modeling variation. Together, these contributions strengthen LeWiDi as a framework and provide new resources, benchmarks, and findings to support the development of disagreement-aware technologies.

Evaluating Textual and Visual Semantic Neighborhoods of Abstract and Concrete Concepts
Sven Naber | Diego Frassinelli | Sabine Schulte Im Walde
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

This paper presents a systematic evaluation of nearest neighbors across semantic representation spaces in both textual and visual modalities. We focus on nominal concepts with varying concreteness levels, and apply a neighborhood overlap measure to compare these target concepts differing in their linguistic and perceptual nature. We find that alignment is primarily determined by modality, and additionally by level of concreteness: Models from the same modality show stronger alignment than cross-modal models, and spaces of concrete concepts show stronger alignment than those of abstract ones. Overall, larger neighborhood size strengthens the alignment between spaces.

2024

Evaluating Semantic Relations in Predicting Textual Labels for Images of Abstract and Concrete Concepts
Tarun Tater | Sabine Schulte Im Walde | Diego Frassinelli
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This study investigates the performance of SigLIP, a state-of-the-art Vision-Language Model (VLM), in predicting labels for images depicting 1,278 concepts. Our analysis across 300 images per concept shows that the model frequently predicts the exact user-tagged labels, but similarly, it often predicts labels that are semantically related to the exact labels in various ways: synonyms, hypernyms, co-hyponyms, and associated words, particularly for abstract concepts. We then zoom into the diversity of the user tags of images and word associations for abstract versus concrete concepts. Surprisingly, not only abstract but also concrete concepts exhibit significant variability, thus challenging the traditional view that representations of concrete concepts are less diverse.

Language Complexity in Populist Rhetoric
Sergio E. Zanotto | Diego Frassinelli | Miriam Butt
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers

Research suggests that politicians labeled as populists tend to use simpler language than their mainstream opponents. Yet, the metrics traditionally employed to assess the complexity of their language do not show consistent and generalizable results across different datasets and languages. This inconsistencies raise questions about the claimed simplicity of populist discourse, suggesting that the issue may be more nuanced than it initially seemed. To address this topic, we analyze the linguistic profile of IMPAQTS, a dataset of transcribed Italian political speeches, to identify linguistic features differentiating populist and non-populist parties. Our methodology ensures comparability of political texts and combines various statistical analyses to reliably identify key linguistic characteristics to test our case study. Results show that the “simplistic” language features previously described in the literature are not robust predictors of populism. This suggests that the characteristics defining populist statements are highly dependent on the specific dataset and the language being analysed, thus limiting the conclusions drawn in previous research. In our study, various linguistic features statistically differentiate between populist and mainstream parties, indicating that populists tend to employ specific well-known rhetorical strategies more frequently; however, none of them strongly indicate that populist parties use simpler language.

Unveiling the mystery of visual attributes of concrete and abstract concepts: Variability, nearest neighbors, and challenging categories
Tarun Tater | Sabine Schulte Im Walde | Diego Frassinelli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The visual representation of a concept varies significantly depending on its meaning and the context where it occurs; this poses multiple challenges both for vision and multimodal models. Our study focuses on concreteness, a well-researched lexical-semantic variable, using it as a case study to examine the variability in visual representations. We rely on images associated with approximately 1,000 abstract and concrete concepts extracted from two different datasets: Bing and YFCC. Our goals are: (i) evaluate whether visual diversity in the depiction of concepts can reliably distinguish between concrete and abstract concepts; (ii) analyze the variability of visual features across multiple images of the same concept through a nearest neighbor analysis; and (iii) identify challenging factors contributing to this variability by categorizing and annotating images. Our findings indicate that for classifying images of abstract versus concrete concepts, a combination of basic visual features such as color and texture is more effective than features extracted by more complex models like Vision Transformer (ViT). However, ViTs show better performances in the nearest neighbor analysis, emphasizing the need for a careful selection of visual features when analyzing conceptual variables through modalities other than text.

Context vs. Human Disagreement in Sarcasm Detection
Hyewon Jang | Moritz Jakob | Diego Frassinelli
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)

Prior work has highlighted the importance of context in the identification of sarcasm by humans and language models. This work examines how much context is required for a better identification of sarcasm by both parties. We collect textual responses to dialogical prompts and sarcasm judgment to the responses placed after long contexts, short contexts, and no contexts. We find that both for humans and language models, the presence of context is generally important in identifying sarcasm in the response. But increasing the amount of context provides no added benefit to humans (long = short > none). This is the same for language models, but only on easily agreed-upon sentences; for sentences with disagreement among human evaluators, different models show different behavior. We also show how sarcasm detection patterns stay consistent as the amount of context is manipulated despite the low agreement in human evaluation.

Comparison of Image Generation Models for Abstract and Concrete Event Descriptions
Mohammed Khaliq | Diego Frassinelli | Sabine Schulte Im Walde
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)

With the advent of diffusion-based image generation models such as DALL-E, Stable Diffusion and Midjourney, high quality images can be easily generated using textual inputs. It is unclear, however, to what extent the generated images resemble human mental representations, especially regarding abstract event knowledge. We analyse the capability of four state-of-the-art models in generating images of verb-object event pairs when we systematically manipulate the degrees of abstractness of both the verbs and the object nouns. Human judgements assess the generated images and demonstrate that DALL-E is strongest for event pairs with concrete nouns (e.g., “pour water”; “believe person”), while Midjourney is preferred for event pairs with abstract nouns (e.g., “raise awareness”; “remain mystery”), irrespective of the concreteness of the verb. Across models, humans were most unsatisfied with images of events pairs that combined concrete verbs with abstract direct-object nouns (e.g., “speak truth”), and an additional ad-hoc annotation contributes this to its potential for figurative language.

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity
Anastasiia Sedova | Robert Litschko | Diego Frassinelli | Benjamin Roth | Barbara Plank
Findings of the Association for Computational Linguistics: EMNLP 2024

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

GRIT: A Dataset of Group Reference Recognition in Italian
Sergio E. Zanotto | Qi Yu | Miriam Butt | Diego Frassinelli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

For the analysis of political discourse a reliable identification of group references, i.e., linguistic components that refer to individuals or groups of people, is useful. However, the task of automatically recognizing group references has not yet gained much attention within NLP. To address this gap, we introduce GRIT (Group Reference for Italian), a large-scale, multi-domain manually annotated dataset for group reference recognition in Italian. GRIT represents a new resource for automatic and generalizable recognition of group references. With this dataset, we aim to establish group reference recognition as a valid classification task, which extends the domain of Named Entity Recognition by expanding its focus to literal and figurative mentions of social groups. We verify the potential of achieving automated group reference recognition for Italian through an experiment employing a fine-tuned BERT model. Our experimental results substantiate the validity of the task, implying a huge potential for applying automated systems to multiple fields of analysis, such as political text or social media analysis.

Generalizable Sarcasm Detection is Just Around the Corner, of Course!
Hyewon Jang | Diego Frassinelli
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We tested the robustness of sarcasm detection models by examining their behavior when fine-tuned on four sarcasm datasets containing varying characteristics of sarcasm: label source (authors vs. third-party), domain (social media/online vs. offline conversations/dialogues), style (aggressive vs. humorous mocking). We tested their prediction performance on the same dataset (intra-dataset) and across different datasets (cross-dataset). For intra-dataset predictions, models consistently performed better when fine-tuned with third-party labels rather than with author labels. For cross-dataset predictions, most models failed to generalize well to the other datasets, implying that one type of dataset cannot represent all sorts of sarcasm with different styles and domains. Compared to the existing datasets, models fine-tuned on the new dataset we release in this work showed the highest generalizability to other datasets. With a manual inspection of the datasets and post-hoc analysis, we attributed the difficulty in generalization to the fact that sarcasm actually comes in different domains and styles. We argue that future sarcasm research should take the broad scope of sarcasm into account.

2023

Investigating the Nature of Disagreements on Mid-Scale Ratings: A Case Study on the Abstractness-Concreteness Continuum
Urban Knupleš | Diego Frassinelli | Sabine Schulte im Walde
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Humans tend to strongly agree on ratings on a scale for extreme cases (e.g., a CAT is judged as very concrete), but judgements on mid-scale words exhibit more disagreement. Yet, collected rating norms are heavily exploited across disciplines. Our study focuses on concreteness ratings and (i) implements correlations and supervised classification to identify salient multi-modal characteristics of mid-scale words, and (ii) applies a hard clustering to identify patterns of systematic disagreement across raters. Our results suggest to either fine-tune or filter mid-scale target words before utilising them.

Figurative Language Processing: A Linguistically Informed Feature Analysis of the Behavior of Language Models and Humans
Hyewon Jang | Qi Yu | Diego Frassinelli
Findings of the Association for Computational Linguistics: ACL 2023

Recent years have witnessed a growing interest in investigating what Transformer-based language models (TLMs) actually learn from the training data. This is especially relevant for complex tasks such as the understanding of non-literal meaning. In this work, we probe the performance of three black-box TLMs and two intrinsically transparent white-box models on figurative language classification of sarcasm, similes, idioms, and metaphors. We conduct two studies on the classification results to provide insights into the inner workings of such models. With our first analysis on feature importance, we identify crucial differences in model behavior. With our second analysis using an online experiment with human participants, we inspect different linguistic characteristics of the four figurative language types.

2022

Concreteness vs. Abstractness: A Selectional Preference Perspective
Tarun Tater | Diego Frassinelli | Sabine Schulte im Walde
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop

Concrete words refer to concepts that are strongly experienced through human senses (banana, chair, salt, etc.), whereas abstract concepts are less perceptually salient (idea, glory, justice, etc.). A clear definition of abstractness is crucial for the understanding of human cognitive processes and for the development of natural language applications such as figurative language detection. In this study, we investigate selectional preferences as a criterion to distinguish between concrete and abstract concepts and words: we hypothesise that abstract and concrete verbs and nouns differ regarding the semantic classes of their arguments. Our study uses a collection of 5,438 nouns and 1,275 verbs to exploit selectional preferences as a salient characteristic in classifying English abstract vs. concrete words, and in predicting their concreteness scores. We achieve an f1-score of 0.84 for nouns and 0.71 for verbs in classification, and Spearman’s ρ correlation of 0.86 for nouns and 0.59 for verbs.

2021

KonTra at CMCL 2021 Shared Task: Predicting Eye Movements by Combining BERT with Surface, Linguistic and Behavioral Information
Qi Yu | Aikaterini-Lida Kalouli | Diego Frassinelli
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This paper describes the submission of the team KonTra to the CMCL 2021 Shared Task on eye-tracking prediction. Our system combines the embeddings extracted from a fine-tuned BERT model with surface, linguistic and behavioral features, resulting in an average mean absolute error of 4.22 across all 5 eye-tracking measures. We show that word length and features representing the expectedness of a word are consistently the strongest predictors across all 5 eye-tracking measures.

Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch
Diego Frassinelli | Gabriella Lapesa | Reem Alatrash | Dominik Schlechtweg | Sabine Schulte im Walde
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

Kiezdeutsch is a variety of German predominantly spoken by teenagers from multi-ethnic urban neighborhoods in casual conversations with their peers. In recent years, the popularity of Kiezdeutsch has increased among young people, independently of their socio-economic origin, and has spread in social media, too. While previous studies have extensively investigated this language variety from a linguistic and qualitative perspective, not much has been done from a quantitative point of view. We perform the first large-scale data-driven analysis of the lexical and morpho-syntactic properties of Kiezdeutsch in comparison with standard German. At the level of results, we confirm predictions of previous qualitative analyses and integrate them with further observations on specific linguistic phenomena such as slang and self-centered speaker attitude. At the methodological level, we provide logistic regression as a framework to perform bottom-up feature selection in order to quantify differences across language varieties.

2020

Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood | Simon Tannert | Diego Frassinelli | Andreas Bulling | Ngoc Thang Vu
Proceedings of the 24th Conference on Computational Natural Language Learning

While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models – despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.

2019

Distributional Interaction of Concreteness and Abstractness in Verb–Noun Subcategorisation
Diego Frassinelli | Sabine Schulte im Walde
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

In recent years, both cognitive and computational research has provided empirical analyses of contextual co-occurrence of concrete and abstract words, partially resulting in inconsistent pictures. In this work we provide a more fine-grained description of the distributional nature in the corpus-based interaction of verbs and nouns within subcategorisation, by investigating the concreteness of verbs and nouns that are in a specific syntactic relationship with each other, i.e., subject, direct object, and prepositional object. Overall, our experiments show consistent patterns in the distributional representation of subcategorising and subcategorised concrete and abstract words. At the same time, the studies reveal empirical evidence why contextual abstractness represents a valuable indicator for automatic non-literal language identification.

2018

Quantitative Semantic Variation in the Contexts of Concrete and Abstract Words
Daniela Naumann | Diego Frassinelli | Sabine Schulte im Walde
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Across disciplines, researchers are eager to gain insight into empirical features of abstract vs. concrete concepts. In this work, we provide a detailed characterisation of the distributional nature of abstract and concrete words across 16,620 English nouns, verbs and adjectives. Specifically, we investigate the following questions: (1) What is the distribution of concreteness in the contexts of concrete and abstract target words? (2) What are the differences between concrete and abstract words in terms of contextual semantic diversity? (3) How does the entropy of concrete and abstract word contexts differ? Overall, our studies show consistent differences in the distributional representation of concrete and abstract words, thus challenging existing theories of cognition and providing a more fine-grained description of their nature.

2017

Contextual Characteristics of Concrete and Abstract Words
Diego Frassinelli | Daniela Naumann | Jason Utt | Sabine Schulte m Walde
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers

Exploring Multi-Modal Text+Image Models to Distinguish between Abstract and Concrete Nouns
Sai Abishek Bhaskar | Maximilian Köper | Sabine Schulte Im Walde | Diego Frassinelli
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication

Co-authors

Venues

nlperspectives1