Filip Miletić

Also published as: Filip Miletic

2025

Multi-word Measures: Modeling Semantic Change in Compound Nouns
Chris Jenkins | Filip Miletić | Sabine Schulte Im Walde
Findings of the Association for Computational Linguistics: ACL 2025

Compound words (e.g. shower thought) provide a multifaceted challenge for diachronic models of semantic change. Datasets describing noun compound semantics tend to describe only the predominant sense of a compound, which is limiting, especially in diachronic settings where senses may shift over time. We create a novel dataset of relatedness judgements of noun compounds in English and German, the first to capture diachronic meaning changes for multi-word expressions without prematurely condensing individual senses into an aggregate value. Furthermore, we introduce a novel, sense-targeting approach for noun compounds that evaluates two contrasting vector representations in their ability to cluster example sentence pairs. Our clustering approach targets both noun compounds and their constituent parts, to model the interdependence of these terms over time. We calculate time-delineated distributions of these clusters and compare them against measures of semantic change aggregated from the human relatedness annotations.

pdf bib

Probing BERT for German Compound Semantics
Filip Miletić | Aaron Schmid | Sabine Schulte im Walde
Proceedings of the 10th edition of the Swiss Text Analytics Conference

pdf bib abs

Spanish Dialect Classification: A Comparative Study of Linguistically Tailored Features, Unigrams and BERT Embeddings
Laura Zeidler | Chris Jenkins | Filip Miletić | Sabine Schulte Im Walde
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

The task of automatic dialect classification is typically tackled using traditional machine-learning models with bag-of-words unigram features. We explore two alternative methods for distinguishing dialects across 20 Spanish-speaking countries:(i) Support vector machine and decision tree models were trained on dialectal features tailored to the Spanish dialects, combined with standard unigrams. (ii) A pre-trained BERT model was fine-tuned on the task.Results show that the tailored features generally did not have a positive impact on traditional model performance, but provide a salient way of representing dialects in a content-agnostic manner. The BERT model wins over traditional models but with only a tiny margin, while sacrificing explainability and interpretability.

pdf bib abs

Modeling the Evolution of English Noun Compounds with Feature-Rich Diachronic Compositionality Prediction
Filip Miletić | Sabine Schulte im Walde
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We analyze the evolution of English noun compounds, which we represent as vectors of time-specific values. We implement a wide array of methods to create a rich set of features, using them to classify compounds for present-day compositionality and to assess the informativeness of the corresponding linguistic patterns. Our best results use BERT – reflecting the similarity of compounds and sentence contexts – and we further capture relevant and complementary information across approaches. Leveraging these feature differences, we find that the development of low-compositional meanings is reflected by a parallel drop in compositionality and sustained semantic change. The same distinction is echoed in transformer processing: compositionality estimates require far less contextualization than semantic change estimates.

2024

pdf bib abs

VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification
Adrian-Gabriel Chifu | Goran Glavaš | Radu Tudor Ionescu | Nikola Ljubešić | Aleksandra Miletić | Filip Miletić | Yves Scherrer | Ivan Vulić
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)

This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2024. The campaign is part of the eleventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2024. Two shared tasks were included this year: dialectal causal commonsense reasoning (DIALECT-COPA), and Multi-label classification of similar languages (DSL-ML). Both tasks were organized for the first time this year, but DSL-ML partially overlaps with the DSL-TL task organized in 2023.

pdf bib abs

A Gold Standard with Silver Linings: Scaling Up Annotation for Distinguishing Bosnian, Croatian, Montenegrin and Serbian
Aleksandra Miletić | Filip Miletić
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

Bosnian, Croatian, Montenegrin and Serbian are the official standard linguistic varieties in Bosnia and Herzegovina, Croatia, Montenegro, and Serbia, respectively. When these four countries were part of the former Yugoslavia, the varieties were considered to share a single linguistic standard. After the individual countries were established, the national standards emerged. Today, a central question about these varieties remains the following: How different are they from each other? How hard is it to distinguish them? While this has been addressed in NLP as part of the task on Distinguishing Between Similar Languages (DSL), little is known about human performance, making it difficult to contextualize system results. We tackle this question by reannotating the existing BCMS dataset for DSL with annotators from all target regions. We release a new gold standard, replacing the original single-annotator, single-label annotation by a multi-annotator, multi-label one, thus improving annotation reliability and explicitly coding the existence of ambiguous instances. We reassess a previously proposed DSL system on the new gold standard and establish the human upper bound on the task. Finally, we identify sources of annotation difficulties and provide linguistic insights into the BCMS dialect continuum, with multiple indicators highlighting an intermediate position of Bosnian and Montenegrin.

pdf bib abs

Gender Identity in Pretrained Language Models: An Inclusive Approach to Data Creation and Probing
Urban Knupleš | Agnieszka Falenska | Filip Miletić
Findings of the Association for Computational Linguistics: EMNLP 2024

Pretrained language models (PLMs) have been shown to encode binary gender information of text authors, raising the risk of skewed representations and downstream harms. This effect is yet to be examined for transgender and non-binary identities, whose frequent marginalization may exacerbate harmful system behaviors. Addressing this gap, we first create TRANsCRIPT, a corpus of YouTube transcripts from transgender, cisgender, and non-binary speakers. Using this dataset, we probe various PLMs to assess if they encode the gender identity information, examining both frozen and fine-tuned representations as well as representations for inputs with author-specific words removed. Our findings reveal that PLM representations encode information for all gender identities but to different extents. The divergence is most pronounced for cis women and non-binary individuals, underscoring the critical need for gender-inclusive approaches to NLP systems.

pdf bib abs

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models
Mohammed Abdul Khaliq | Paul Yu-Chun Chang | Mingyang Ma | Bernhard Pflugfelder | Filip Miletić
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

The escalating challenge of misinformation, particularly in political discourse, requires advanced fact-checking solutions; this is even clearer in the more complex scenario of multimodal claims. We tackle this issue using a multimodal large language model in conjunction with retrieval-augmented generation (RAG), and introduce two novel reasoning techniques: Chain of RAG (CoRAG) and Tree of RAG (ToRAG). They fact-check multimodal claims by extracting both textual and image content, retrieving external information, and reasoning subsequent questions to be answered based on prior evidence. We achieve a weighted F1-score of 0.85, surpassing a baseline reasoning technique by 0.14 points. Human evaluation confirms that the vast majority of our generated fact-check explanations contain all information from gold standard data.

pdf bib abs

Semantics of Multiword Expressions in Transformer-Based Models: A Survey
Filip Miletić | Sabine Schulte im Walde
Transactions of the Association for Computational Linguistics, Volume 12

Multiword expressions (MWEs) are composed of multiple words and exhibit variable degrees of compositionality. As such, their meanings are notoriously difficult to model, and it is unclear to what extent this issue affects transformer architectures. Addressing this gap, we provide the first in-depth survey of MWE processing with transformer models. We overall find that they capture MWE semantics inconsistently, as shown by reliance on surface patterns and memorized information. MWE meaning is also strongly localized, predominantly in early layers of the architecture. Representations benefit from specific linguistic properties, such as lower semantic idiosyncrasy and ambiguity of target expressions. Our findings overall question the ability of transformer models to robustly capture fine-grained semantics. Furthermore, we highlight the need for more directly comparable evaluation setups.

pdf bib abs

Visualising Changes in Semantic Neighbourhoods of English Noun Compounds over Time
Malak Rassem | Myrto Tsigkouli | Chris W. Jenkins | Filip Miletić | Sabine Schulte im Walde
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This paper provides a framework and tool set for computing and visualising dynamic, time- specific semantic neighbourhoods of English noun-noun compounds and their constituents over time. Our framework not only identifies salient vector-space dimensions and neighbours in notoriously sparse data: we specifically bring together changes in meaning aspects and degrees of (non-)compositionality.

pdf bib abs

What Can Diachronic Contexts and Topics Tell Us about the Present-Day Compositionality of English Noun Compounds?
Samin Mahdizadeh Sani | Malak Rassem | Chris Jenkins | Filip Miletić | Sabine Schulte im Walde
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Predicting the compositionality of noun compounds such as climate change and tennis elbow is a vital component in natural language understanding. While most previous computational methods that automatically determine the semantic relatedness between compounds and their constituents have applied a synchronic perspective, the current study investigates what diachronic changes in contexts and semantic topics of compounds and constituents reveal about the compounds’ present-day degrees of compositionality. We define a binary classification task that utilizes two diachronic vector spaces based on contextual co-occurrences and semantic topics, and demonstrate that diachronic changes in cosine similarities – measured over context or topic distributions – uncover patterns that distinguish between compounds with low and high present-day compositionality. Despite fewer dimensions in the topic models, the topic space performs on par with the co-occurrence space and captures rather similar information. Temporal similarities between compounds and modifiers as well as between compounds and their prepositional paraphrases predict the compounds’ present-day compositionality with accuracy >0.7.

2023

pdf bib abs

A Systematic Search for Compound Semantics in Pretrained BERT Architectures
Filip Miletic | Sabine Schulte im Walde
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

To date, transformer-based models such as BERT have been less successful in predicting compositionality of noun compounds than static word embeddings. This is likely related to a suboptimal use of the encoded information, reflecting an incomplete grasp of how the models represent the meanings of complex linguistic structures. This paper investigates variants of semantic knowledge derived from pretrained BERT when predicting the degrees of compositionality for 280 English noun compounds associated with human compositionality ratings. Our performance strongly improves on earlier unsupervised implementations of pretrained BERT and highlights beneficial decisions in data preprocessing, embedding computation, and compositionality estimation. The distinct linguistic roles of heads and modifiers are reflected by differences in BERT-derived representations, with empirical properties such as frequency, productivity, and ambiguity affecting model performance. The most relevant representational information is concentrated in the initial layers of the model architecture.

pdf bib

Classifying Noun Compounds for Present-Day Compositionality: Contributions of Diachronic Frequency and Productivity Patterns
Maximilian Maurer | Chris Jenkins | Filip Miletić | Sabine Schulte im Walde
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

pdf bib abs

Understanding Computational Models of Semantic Change: New Insights from the Speech Community
Filip Miletić | Anne Przewozny-Desriaux | Ludovic Tanguy
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We investigate the descriptive relevance of widely used semantic change models in linguistic descriptions of present-day speech communities. We focus on the sociolinguistic issue of contact-induced semantic shifts in Quebec English, and analyze 40 target words using type-level and token-level word embeddings, empirical linguistic properties, and – crucially – acceptability ratings and qualitative remarks by 15 speakers from Montreal. Our results confirm the overall relevance of the computational approaches, but also highlight practical issues and the complementary nature of different semantic change estimates. To our knowledge, this is the first study to substantively engage with the speech community being described using semantic change models.

pdf bib abs

To Split or Not to Split: Composing Compounds in Contextual Vector Spaces
Chris Jenkins | Filip Miletic | Sabine Schulte im Walde
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We investigate the effect of sub-word tokenization on representations of German noun compounds: single orthographic words which are composed of two or more constituents but often tokenized into units that are not morphologically motivated or meaningful. Using variants of BERT models and tokenization strategies on domain-specific restricted diachronic data, we introduce a suite of evaluations relying on the masked language modelling task and compositionality prediction. We obtain the most consistent improvements by pre-splitting compounds into constituents.

2021

pdf bib abs

Detecting Contact-Induced Semantic Shifts: What Can Embedding-Based Methods Do in Practice?
Filip Miletic | Anne Przewozny-Desriaux | Ludovic Tanguy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This study investigates the applicability of semantic change detection methods in descriptively oriented linguistic research. It specifically focuses on contact-induced semantic shifts in Quebec English. We contrast synchronic data from different regions in order to identify the meanings that are specific to Quebec and potentially related to language contact. Type-level embeddings are used to detect new semantic shifts, and token-level embeddings to isolate regionally specific occurrences. We introduce a new 80-item test set and conduct both quantitative and qualitative evaluations. We demonstrate that diachronic word embedding methods can be applied to contact-induced semantic shifts observed in synchrony, obtaining results comparable to the state of the art on similar tasks in diachrony. However, we show that encouraging evaluation results do not translate to practical value in detecting new semantic shifts. Finally, our application of token-level embeddings accelerates manual data exploration and provides an efficient way of scaling up sociolinguistic analyses.

2020

pdf bib abs

Collecting Tweets to Investigate Regional Variation in Canadian English
Filip Miletic | Anne Przewozny-Desriaux | Ludovic Tanguy
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a 78.8-million-tweet, 1.3-billion-word corpus aimed at studying regional variation in Canadian English with a specific focus on the dialect regions of Toronto, Montreal, and Vancouver. Our data collection and filtering pipeline reflects complex design criteria, which aim to allow for both data-intensive modeling methods and user-level variationist sociolinguistic analysis. It specifically consists in identifying Twitter users from the three cities, crawling their entire timelines, filtering the collected data in terms of user location and tweet language, and automatically excluding near-duplicate content. The resulting corpus mirrors national and regional specificities of Canadian English, it provides sufficient aggregate and user-level data, and it maintains a reasonably balanced distribution of content across regions and users. The utility of this dataset is illustrated by two example applications: the detection of regional lexical and topical variation, and the identification of contact-induced semantic shifts using vector space models. In accordance with Twitter’s developer policy, the corpus will be publicly released in the form of tweet IDs.

Filip Miletić

2025

2024

2023

2021

2020

Co-authors

Venues