Marie Francine Moens

Also published as: Marie-Francine Moens, Marie-francine Moens

2025

pdf bib abs
Language Fusion for Parameter-Efficient Cross-lingual Transfer
Philipp Borchert | Ivan Vulić | Marie-Francine Moens | Jochen De Weerdt
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Limited availability of multilingual text corpora for training language models often leads to poor performance on downstream tasks due to undertrained representation spaces for languages other than English. This ‘under-representation’ has motivated recent cross-lingual transfer methods to leverage the English representation space by e.g. mixing English and ‘non-English’ tokens at the input level or extending model parameters to accommodate new languages. However, these approaches often come at the cost of increased computational complexity. We propose Fusion for Language Representations (FLARE) in adapters, a novel method that enhances representation quality and downstream performance for languages other than English while maintaining parameter efficiency. FLARE integrates source and target language representations within low-rank (LoRA) adapters using lightweight linear transformations, maintaining parameter efficiency while improving transfer performance. A series of experiments across representative cross-lingual natural language understanding tasks, including natural language inference, question-answering and sentiment analysis, demonstrate FLARE’s effectiveness. FLARE achieves performance improvements of 4.9% for Llama 3.1 and 2.2% for Gemma 2 compared to standard LoRA fine-tuning on question-answering tasks, as measured by the exact match metric.

pdf bib abs
Mitigating Negative Interference in Multilingual Knowledge Editing through Null-Space Constraints
Wei Sun | Tingyu Qu | Mingxiao Li | Jesse Davis | Marie-Francine Moens
Findings of the Association for Computational Linguistics: ACL 2025

Efficiently updating multilingual knowledge in large language models (LLMs) without disrupting coherent factual representations across languages remains a significant challenge. While deploying separate editing systems for each language might seem viable, this approach incurs substantial costs due to the need to manage multiple models. A more efficient solution involves integrating knowledge updates across all languages into a unified model. However, sequential edits across languages often lead to destructive parameter interference, significantly degrading multilingual generalization and the accuracy of injected knowledge. To address this issue, we propose LangEdit, a novel null-space constrained framework designed to precisely isolate language-specific knowledge updates. The core innovation of LangEdit lies in its ability to project parameter updates for each language onto the orthogonal complement of other languages’ subspaces. This approach mathematically guarantees update independence while preserving multilingual generalization capabilities. We conduct a comprehensive evaluation across three model architectures, six languages, and four downstream tasks, demonstrating that LangEdit effectively mitigates parameter interference and outperforms existing state-of-the-art editing methods. Our results highlight its potential for enabling efficient and accurate multilingual knowledge updates in LLMs.

pdf bib abs
Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
Andrei Catalin Coman | Christos Theodoropoulos | Marie-Francine Moens | James Henderson
Findings of the Association for Computational Linguistics: ACL 2025

We propose Fast-and-Frugal Text-Graph (FnF-TG) Transformers, a Transformer-based framework that unifies textual and structural information for inductive link prediction in text-attributed knowledge graphs. We demonstrate that, by effectively encoding ego-graphs (1-hop neighbourhoods), we can reduce the reliance on resource-intensive textual encoders. This makes the model both fast at training and inference time, as well as frugal in terms of cost. We perform a comprehensive evaluation on three popular datasets and show that FnF-TG can achieve superior performance compared to previous state-of-the-art methods. We also extend inductive learning to a fully inductive setting, where relations don’t rely on transductive (fixed) representations, as in previous work, but are a function of their textual description. Additionally, we introduce new variants of existing datasets, specifically designed to test the performance of models on unseen relations at inference time, thus offering a new test-bench for fully inductive link prediction.

2024

pdf bib abs
Computational Linguistics for Brain Encoding and Decoding: Principles, Practices and Beyond
Jingyuan Sun | Shaonan Wang | Zijiao Chen | Jixing Li | Marie-Francine Moens
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)

Computational linguistics (CL) has witnessed tremendous advancements in recent years, with models such as large language models demonstrating exceptional performance in various natural language processing tasks. These advancements highlight their potential to help understand brain language processing, especially through the lens of brain encoding and decoding. Brain encoding involves the mapping of linguistic stimuli to brain activity, while brain decoding is the process of reconstructing linguistic stimuli from observed brain activities. CL models that excel at capturing and manipulating linguistic features are crucial for mapping linguistic stimuli to brain activities and vice versa. Brain encoding and decoding have vast applications, from enhancing human-computer interaction to developing assistive technologies for individuals with communication impairments. This tutorial will focus on elucidating how computational linguistics can facilitate brain encoding and decoding. We will delve into the principles and practices of using computational linguistics methods for brain encoding and decoding. We will also discuss the challenges and future directions of brain encoding and decoding. Through this tutorial, we aim to provide a comprehensive and informative overview of the intersection between computational linguistics and cognitive neuroscience, inspiring future research in this exciting and rapidly evolving field.

pdf bib abs
Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration
Kanimozhi Uma | Marie-Francine Moens
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

In recent years, the analysis of clinical texts has evolved significantly, driven by the emergence of language models like BERT such as PubMedBERT, and ClinicalBERT, which have been tailored for the (bio)medical domain that rely on extensive archives of medical documents. While they boast high accuracy, their lack of interpretability and language transfer limitations restrict their clinical utility. To address this, we propose a new, lightweight graph-based embedding method designed specifically for radiology reports. This approach considers the report’s structure and content, connecting medical terms through the multilingual SNOMED Clinical Terms knowledge base. The resulting graph embedding reveals intricate relationships among clinical terms, enhancing both clinician comprehension and clinical accuracy without the need for large pre-training datasets. Demonstrating the versatility of our method, we apply this embedding to two tasks: disease and image classification in X-ray reports. In disease classification, our model competes effectively with BERT-based approaches, yet it is significantly smaller and requires less training data. Additionally, in image classification, we illustrate the efficacy of the graph embedding by leveraging cross-modal knowledge transfer, highlighting its applicability across diverse languages.

pdf bib abs
ERC Advanced Grant Project CALCULUS: Extending the Boundary of Machine Translation
Jingyuan Sun | Mingxiao Li | Ruben Cartuyvels | Marie-Francine Moens
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)

The CALCULUS project, drawing on human capabilities of imagination and commonsense for natural language understanding (NLU), aims to advance machine-based NLU by integrating traditional AI concepts with contemporary machine learning techniques. It focuses on developing anticipatory event representations from both textual and visual data, connecting language structure to visual spatial organization and incorporating broad knowledge domains. Through testing these models in NLU tasks and evaluating their ability to predict untrained spatial and temporal details using real-world metrics, CALCULUS employs machine learning methods, including Bayesian techniques and neural networks, especially in data-sparse scenarios. The project’s culmination involves creating demonstrators that transform written stories into dynamic videos, showcasing the interdisciplinary expertise of the project leader in natural language processing, language and visual data analysis, information retrieval, and machine learning, all vital for the project’s achievements. In the CALCULUS project, our exploration of machine translation extends beyond the conventional text-to-text framework. We are broadening the horizons of machine translation by delving into the essence of transforming the formats of data distribution while keeping the meaning. This innovative approach involves converting information from one modality into another, transcending traditional linguistic boundaries. Our project includes novel work on translating text into images and videos, brain signals into images and videos.

pdf bib abs
A Generic Method for Fine-grained Category Discovery in Natural Language Texts
Chang Tian | Matthew B. Blaschko | Wenpeng Yin | Mingzhe Xing | Yinliang Yue | Marie-Francine Moens
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermore, some evaluation techniques that rely on pre-collected test samples are inadequate for real-time applications. To address these shortcomings, we introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function. The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space and to form distinct clusters that represent fine-grained categories. We also propose a centroid inference mechanism to support real-time applications. The efficacy of the method is both theoretically justified and empirically confirmed on three benchmark tasks. The proposed objective function is integrated in multiple contrastive learning based neural models. Its results surpass existing state-of-the-art approaches in terms of Accuracy, Adjusted Rand Index and Normalized Mutual Information of the detected fine-grained categories. Code and data are publicly available at https://github.com/changtianluckyforever/F-grained-STAR.

pdf bib abs
“Image, Tell me your story!” Predicting the original meta-context of visual misinformation
Jonathan Tonglet | Marie-Francine Moens | Iryna Gurevych
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

To assist human fact-checkers, researchers have developed automated approaches for visual misinformation detection. These methods assign veracity scores by identifying inconsistencies between the image and its caption, or by detecting forgeries in the image. However, they neglect a crucial point of the human fact-checking process: identifying the original meta-context of the image. By explaining what is actually true about the image, fact-checkers can better detect misinformation, focus their efforts on check-worthy visual content, engage in counter-messaging before misinformation spreads widely, and make their explanation more convincing. Here, we fill this gap by introducing the task of automated image contextualization. We create 5Pils, a dataset of 1,676 fact-checked images with question-answer pairs about their original meta-context. Annotations are based on the 5 Pillars fact-checking framework. We implement a first baseline that grounds the image in its original meta-context using the content of the image and textual evidence retrieved from the open web. Our experiments show promising results while highlighting several open challenges in retrieval and reasoning.

pdf bib abs
Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models
Vladimir Araujo | Marie-Francine Moens | Tinne Tuytelaars
Findings of the Association for Computational Linguistics: EMNLP 2024

Parameter-efficient fine-tuning (PEFT) methods are increasingly used with pre-trained language models (PLMs) for continual learning (CL). These methods typically involve training a PEFT module for each new task and employing similarity-based selection to route modules during inference. However, they face two major limitations: 1) interference during module training with already learned modules and 2) suboptimal routing when composing modules. In this paper, we present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization. L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks. We evaluate our method in two CL setups using various benchmarks. Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.

pdf bib abs
MSNER: A Multilingual Speech Dataset for Named Entity Recognition
Quentin Meeus | Marie-Francine Moens | Hugo Van hamme
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

While extensively explored in text-based tasks, Named Entity Recognition (NER) remains largely neglected in spoken language understanding. Existing resources are limited to a single, English-only dataset. This paper addresses this gap by introducing MSNER, a freely available, multilingual speech corpus annotated with named entities. It provides annotations to the VoxPopuli dataset in four languages (Dutch, French, German, and Spanish). We have also releasing an efficient annotation tool that leverages automatic pre-annotations for faster manual refinement. This results in 590 and 15 hours of silver-annotated speech for training and validation, alongside a 17-hour, manually-annotated evaluation set. We further provide an analysis comparing silver and gold annotations. Finally, we present baseline NER models to stimulate further research on this newly available dataset.

pdf bib abs
GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction
Andrei Coman | Christos Theodoropoulos | Marie-Francine Moens | James Henderson
Proceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP

Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.

pdf bib abs
DMON: A Simple Yet Effective Approach for Argument Structure Learning
Sun Wei | Mingxiao Li | Jingyuan Sun | Jesse Davis | Marie-Francine Moens
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Argument structure learning (ASL) entails predicting relations between arguments. Because it can structure a document to facilitate its understanding, it has been widely applied in many fields (medical, commercial, and scientific domains). Despite its broad utilization, ASL remains a challenging task because it involves examining the complex relationships between the sentences in a potentially unstructured discourse. To resolve this problem, we have developed a simple yet effective approach called Dual-tower Multi-scale cOnvolution neural Network (DMON) for the ASL task. Specifically, we organize arguments into a relationship matrix that together with the argument embeddings forms a relationship tensor and design a mechanism to capture relations with contextual arguments. Experimental results on three different-domain argument mining datasets demonstrate that our framework outperforms state-of-the-art models. We will release the code after paper acceptance.

pdf bib abs
Generating Multiple-choice Questions for Medical Question Answering with Distractors and Cue-masking
Damien Sileo | Kanimozhi Uma | Marie-Francine Moens
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Medical multiple-choice question answering (MCQA) is a challenging evaluation for medical natural language processing and a helpful task in itself. Medical questions may describe patient symptoms and ask for the correct diagnosis, which requires domain knowledge and complex reasoning. Standard language modeling pretraining alone is not sufficient to achieve the best results with BERT-base size (Devlin et al., 2019) encoders. Jin et al. (2020) showed that focusing masked language modeling on disease name prediction when using medical encyclopedic paragraphs as input leads to considerable MCQA accuracy improvement. In this work, we show that (1) fine-tuning on generated MCQA dataset outperforms the masked language modeling based objective and (2) correctly masking the cues to the answers is critical for good performance. We release new pretraining datasets and achieve state-of-the-art results on 4 MCQA datasets, notably +5.7% with base-size model on MedQA-USMLE.

pdf bib abs
Sequence-to-Sequence Spanish Pre-trained Language Models
Vladimir Araujo | Maria Mihaela Trusca | Rodrigo Tufiño | Marie-Francine Moens
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In recent years, significant advancements in pre-trained language models have driven the creation of numerous non-English language variants, with a particular emphasis on encoder-only and decoder-only architectures. While Spanish language models based on BERT and GPT have demonstrated proficiency in natural language understanding and generation, there remains a noticeable scarcity of encoder-decoder models explicitly designed for sequence-to-sequence tasks, which aim to map input sequences to generate output sequences conditionally. This paper breaks new ground by introducing the implementation and evaluation of renowned encoder-decoder architectures exclusively pre-trained on Spanish corpora. Specifically, we present Spanish versions of BART, T5, and BERT2BERT-style models and subject them to a comprehensive assessment across various sequence-to-sequence tasks, including summarization, question answering, split-and-rephrase, dialogue, and translation. Our findings underscore the competitive performance of all models, with the BART- and T5-based models emerging as top performers across all tasks. We have made all models publicly available to the research community to foster future explorations and advancements in Spanish NLP: https://github.com/vgaraujov/Seq2Seq-Spanish-PLMs.

pdf bib abs
Visually-Aware Context Modeling for News Image Captioning
Tingyu Qu | Tinne Tuytelaars | Marie-Francine Moens
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

News Image Captioning aims to create captions from news articles and images, emphasizing the connection between textual context and visual elements. Recognizing the significance of human faces in news images and the face-name co-occurrence pattern in existing datasets, we propose a face-naming module for learning better name embeddings. Apart from names, which can be directly linked to an image area (faces), news image captions mostly contain context information that can only be found in the article. We design a retrieval strategy using CLIP to retrieve sentences that are semantically close to the image, mimicking human thought process of linking articles to images. Furthermore, to tackle the problem of the imbalanced proportion of article context and image context in captions, we introduce a simple yet effective method Contrasting with Language Model backbone (CoLaM) to the training pipeline. We conduct extensive experiments to demonstrate the efficacy of our framework. We out-perform the previous state-of-the-art (without external data) by 7.97/5.80 CIDEr scores on GoodNews/NYTimes800k. Our code is available at https://github.com/tingyu215/VACNIC.

pdf bib abs
Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning
Philipp Borchert | Jochen De Weerdt | Marie-Francine Moens
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Differentiating relationships between entity pairs with limited labeled instances poses a significant challenge in few-shot relation classification. Representations of textual data extract rich information spanning the domain, entities, and relations. In this paper, we introduce a novel approach to enhance information extraction combining multiple sentence representations and contrastive learning. While representations in relation classification are commonly extracted using entity marker tokens, we argue that substantial information within the internal model representations remains untapped. To address this, we propose aligning multiple sentence representations, such as the CLS] token, the [MASK] token used in prompting, and entity marker tokens. Our method employs contrastive learning to extract complementary discriminative information from these individual representations. This is particularly relevant in low-resource settings where information is scarce. Leveraging multiple sentence representations is especially effective in distilling discriminative information for relation classification when additional information, like relation descriptions, are not available. We validate the adaptability of our approach, maintaining robust performance in scenarios that include relation descriptions, and showcasing its flexibility to adapt to different resource constraints.

pdf bib abs
Spatial and Temporal Language Understanding: Representation, Reasoning, and Grounding
Parisa Kordjamshidi | Qiang Ning | James Pustejovsky | Marie-Francine Moens
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)

This tutorial provides an overview of the cutting edge research on spatial and temporal language understanding. We also cover some essential background material from various subdisciplines to this topic, which we believe will enrich the CL community’s appreciation of the complexity of spatiotemporal reasoning.

pdf bib abs
OrigamIM: A Dataset of Ambiguous Sentence Interpretations for Social Grounding and Implicit Language Understanding
Liesbeth Allein | Marie-Francine Moens
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

Sentences elicit different interpretations and reactions among readers, especially when there is ambiguity in their implicit layers. We present a first-of-its kind dataset of sentences from Reddit, where each sentence is annotated with multiple interpretations of its meanings, understandings of implicit moral judgments about mentioned people, and reader impressions of its author. Scrutiny of the dataset proves the evoked variability and polarity in reactions. It further shows that readers strongly disagree on both the presence of implied judgments and the social acceptability of the behaviors they evaluate. In all, the dataset offers a valuable resource for socially grounding language and modeling the intricacies of implicit language understanding from multiple reader perspectives.

pdf bib abs
KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies
Sumam Francis | Marie-Francine Moens
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood disorders (annotated as “1”) versus those merely mentioning a disorder (annotated as “0”). We utilized a classification model enhanced with diverse textual and language model-based augmentations. To ensure quality, we used semantic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised contrastive learning and cross-entropy-based learning, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and median scores.

pdf bib abs
Explicitly Representing Syntax Improves Sentence-to-Layout Prediction of Unexpected Situations
Wolf Nuyts | Ruben Cartuyvels | Marie-Francine Moens
Transactions of the Association for Computational Linguistics, Volume 12

Recognizing visual entities in a natural language sentence and arranging them in a 2D spatial layout require a compositional understanding of language and space. This task of layout prediction is valuable in text-to-image synthesis as it allows localized and controlled in-painting of the image. In this comparative study it is shown that we can predict layouts from language representations that implicitly or explicitly encode sentence syntax, if the sentences mention similar entity-relationships to the ones seen during training. To test compositional understanding, we collect a test set of grammatically correct sentences and layouts describing compositions of entities and relations that unlikely have been seen during training. Performance on this test set substantially drops, showing that current models rely on correlations in the training data and have difficulties in understanding the structure of the input sentences. We propose a novel structural loss function that better enforces the syntactic structure of the input sentence and show large performance gains in the task of 2D spatial layout prediction conditioned on text. The loss has the potential to be used in other generation tasks where a tree-like structure underlies the conditioning modality. Code, trained models, and the USCOCO evaluation set are available via Github.1

2023

pdf bib abs
CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation.
Philipp Borchert | Jochen De Weerdt | Kristof Coussement | Arno De Caigny | Marie-Francine Moens
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We introduce CORE, a dataset for few-shot relation classification (RC) focused on company relations and business entities. CORE includes 4,708 instances of 12 relation types with corresponding textual evidence extracted from company Wikipedia pages. Company names and business entities pose a challenge for few-shot RC models due to the rich and diverse information associated with them. For example, a company name may represent the legal entity, products, people, or business divisions depending on the context. Therefore, deriving the relation type between entities is highly dependent on textual context. To evaluate the performance of state-of-the-art RC models on the CORE dataset, we conduct experiments in the few-shot domain adaptation setting. Our results reveal substantial performance gaps, confirming that models trained on different domains struggle to adapt to CORE. Interestingly, we find that models trained on CORE showcase improved out-of-domain performance, which highlights the importance of high-quality data for robust domain generalization. Specifically, the information richness embedded in business entities allows models to focus on contextual nuances, reducing their reliance on superficial clues such as relation-specific verbs. In addition to the dataset, we provide relevant code snippets to facilitate reproducibility and encourage further research in the field. The CORE dataset and code are publicly available at https://github.com/pnborchert/CORE.

pdf bib abs
Implicit Temporal Reasoning for Evidence-Based Fact-Checking
Liesbeth Allein | Marlon Saelens | Ruben Cartuyvels | Marie-Francine Moens
Findings of the Association for Computational Linguistics: EACL 2023

Leveraging contextual knowledge has become standard practice in automated claim verification, yet the impact of temporal reasoning has been largely overlooked. Our study demonstrates that time positively influences the claim verification process of evidence-based fact-checking. The temporal aspects and relations between claims and evidence are first established through grounding on shared timelines, which are constructed using publication dates and time expressions extracted from their text. Temporal information is then provided to RNN-based and Transformer-based classifiers before or after claim and evidence encoding. Our time-aware fact-checking models surpass base models by up to 9% Micro F1 (64.17%) and 15% Macro F1 (47.43%) on the MultiFC dataset. They also outperform prior methods that explicitly model temporal relations between evidence. Our findings show that the presence of temporal information and the manner in which timelines are constructed greatly influence how fact-checking models determine the relevance and supporting or refuting character of evidence documents.

pdf bib abs
A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information
Vladimir Araujo | Alvaro Soto | Marie-Francine Moens
Findings of the Association for Computational Linguistics: ACL 2023

Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms’ importance for memory models.

pdf bib abs
Probing neural language models for understanding of words of estimative probability
Damien Sileo | Marie-francine Moens
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Words of Estimative Probability (WEP) are phrases used to express the plausibility of a statement. Examples include terms like \textit{probably, maybe, likely, doubt, unlikely}, and \textit{impossible}. Surveys have shown that human evaluators tend to agree when assigning numerical probability levels to these WEPs. For instance, the term \textit{highly likely} equates to a median probability of $0.90{\pm}0.08$ according to a survey by \citet{fagen-ulmschneider}.In this study, our focus is to gauge the competency of neural language processing models in accurately capturing the consensual probability level associated with each WEP. Our first approach is utilizing the UNLI dataset \cite{chen-etal-2020-uncertain}, which links premises and hypotheses with their perceived joint probability $p$. From this, we craft prompts in the form: "[\textsc{Premise}]. [\textsc{Wep}], [\textsc{Hypothesis}].” This allows us to evaluate whether language models can predict if the consensual probability level of a WEP aligns closely with $p$.In our second approach, we develop a dataset based on WEP-focused probabilistic reasoning to assess if language models can logically process WEP compositions. For example, given the prompt "[\textsc{EventA}] \textit{is likely}. [\textsc{EventB}] \textit{is impossible}.”, a well-functioning language model should not conclude that [\textsc{EventA$\&$B}] is likely. Through our study, we observe that both tasks present challenges to out-of-the-box English language models. However, we also demonstrate that fine-tuning these models can lead to significant and transferable improvements.

2022

pdf bib abs
How Relevant is Selective Memory Population in Lifelong Language Learning?
Vladimir Araujo | Helena Balabin | Julio Hurtado | Alvaro Soto | Marie-Francine Moens
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Lifelong language learning seeks to have models continuously learn multiple tasks in a sequential order without suffering from catastrophic forgetting. State-of-the-art approaches rely on sparse experience replay as the primary approach to prevent forgetting. Experience replay usually adopts sampling methods for the memory population; however, the effect of the chosen sampling strategy on model performance has not yet been studied. In this paper, we investigate how relevant the selective memory population is in the lifelong learning process of text classification and question-answering tasks. We found that methods that randomly store a uniform number of samples from the entire data stream lead to high performances, especially for low memory size, which is consistent with computer vision studies.

pdf bib abs
Finding Structural Knowledge in Multimodal-BERT
Victor Milewski | Miryam de Lhoneux | Marie-Francine Moens
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models. More specifically, we probe their capabilities of storing the grammatical structure of linguistic data and the structure learned over objects in visual data. To reach that goal, we first make the inherent structure of language and visuals explicit by a dependency parse of the sentences that describe the image and by the dependencies between the object regions in the image, respectively. We call this explicit visual structure the scene tree, that is based on the dependency tree of the language description. Extensive probing experiments show that the multimodal-BERT models do not encode these scene trees.

pdf bib abs
Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System
Chang Tian | Wenpeng Yin | Marie-Francine Moens
Findings of the Association for Computational Linguistics: NAACL 2022

A dialogue policy module is an essential part of task-completion dialogue systems. Recently, increasing interest has focused on reinforcement learning (RL)-based dialogue policy. Its favorable performance and wise action decisions rely on an accurate estimation of action values. The overestimation problem is a widely known issue of RL since its estimate of the maximum action value is larger than the ground truth, which results in an unstable learning process and suboptimal policy. This problem is detrimental to RL-based dialogue policy learning. To mitigate this problem, this paper proposes a dynamic partial average estimator (DPAV) of the ground truth maximum action value. DPAV calculates the partial average between the predicted maximum action value and minimum action value, where the weights are dynamically adaptive and problem-dependent. We incorporate DPAV into a deep Q-network as the dialogue policy and show that our method can achieve better or comparable results compared to top baselines on three dialogue datasets of different domains with a lower computational load. In addition, we also theoretically prove the convergence and derive the upper and lower bounds of the bias compared with those of other methods.

pdf bib abs
Analysis and Prediction of NLP Models via Task Embeddings
Damien Sileo | Marie-Francine Moens
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Task embeddings are low-dimensional representations that are trained to capture task properties. In this paper, we propose MetaEval, a collection of 101 NLP tasks. We fit a single transformer to all MetaEval tasks jointly while conditioning it on learned embeddings. The resulting task embeddings enable a novel analysis of the space of tasks. We then show that task aspects can be mapped to task embeddings for new tasks without using any annotated examples. Predicted embeddings can modulate the encoder for zero-shot inference and outperform a zero-shot baseline on GLUE tasks. The provided multitask setup can function as a benchmark for future transfer learning research.

Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models’ quality. In this paper, we narrow the gap by building two evaluation benchmarks. Inspired by previous work (Conneau and Kiela, 2018; Chen et al., 2019), we introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations, respectively. Our benchmarks include considerable pre-existing and newly constructed datasets that address different tasks from various domains. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations. As an example, we discover that for the case of discourse evaluation tasks, mBERT, a language model trained on multiple languages, usually provides a richer latent representation than models trained only with documents in Spanish. We hope our contribution will motivate a fairer, more comparable, and less cumbersome way to evaluate future Spanish language models.

pdf bib abs
KUL@SMM4H’22: Template Augmented Adaptive Pre-training for Tweet Classification
Sumam Francis | Marie-Francine Moens
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes models developed for the Social Media Mining for Health (SMM4H) 2022 shared tasks. Our team participated in the first subtask that classifies tweets with Adverse Drug Effect (ADE) mentions. Our best-performing model comprises of a template augmented task adaptive pre-training and further fine-tuning on target task data. Augmentation with random prompt templates increases the amount of task-specific data to generalize the LM to the target task domain. We explore 2 pre-training strategies: Masked language modeling (MLM) and Simple contrastive pre-training (SimSCE) and the impact of adding template augmentations with these pre-training strategies. Our system achieves an F1 score of 0.433 on the test set without using supplementary resources and medical dictionaries.

2021

pdf bib abs
Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning
Christos Theodoropoulos | James Henderson | Andrei Catalin Coman | Marie-Francine Moens
Proceedings of the 25th Conference on Computational Natural Language Learning

Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al., 2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.

pdf bib abs
Modeling Coreference Relations in Visual Dialog
Mingxiao Li | Marie-Francine Moens
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes it a more challenging task than visual question-answering. Most previous works have focused on learning better multi-modal representations or on exploring different ways of fusing visual and language features, while the coreferences in the dialog are mainly ignored. In this paper, based on linguistic knowledge and discourse features of human dialog we propose two soft constraints that can improve the model’s ability of resolving coreferences in dialog in an unsupervised way. Experimental results on the VisDial v1.0 dataset shows that our model, which integrates two novel and linguistically inspired soft constraints in a deep transformer neural architecture, obtains new state-of-the-art performance in terms of recall at 1 and other evaluation metrics compared to current existing models and this without pretraining on other vision language datasets. Our qualitative results also demonstrate the effectiveness of the method that we propose.

pdf bib
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Marie-Francine Moens | Xuanjing Huang | Lucia Specia | Scott Wen-tau Yih
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations
Vladimir Araujo | Andrés Villa | Marcelo Mendoza | Marie-Francine Moens | Alvaro Soto
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.

pdf bib
Findings of the Association for Computational Linguistics: EMNLP 2021
Marie-Francine Moens | Xuanjing Huang | Lucia Specia | Scott Wen-tau Yih
Findings of the Association for Computational Linguistics: EMNLP 2021

pdf bib
Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Marius Mosbach | Michael A. Hedderich | Sandro Pezzelle | Aditya Mogadala | Dietrich Klakow | Marie-Francine Moens | Zeynep Akata
Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

pdf bib abs
LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts and Images using CLIP features
Erfan Ghadery | Damien Sileo | Marie-Francine Moens
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We describe our approach for SemEval-2021 task 6 on detection of persuasion techniques in multimodal content (memes). Our system combines pretrained multimodal models (CLIP) and chained classifiers. Also, we propose to enrich the data by a data augmentation technique. Our submission achieves a rank of 8/16 in terms of F1-micro and 9/16 with F1-macro on the test set.

pdf bib abs
Just Ask! Evaluating Machine Translation by Asking and Answering Questions
Mateusz Krubiński | Erfan Ghadery | Marie-Francine Moens | Pavel Pecina
Proceedings of the Sixth Conference on Machine Translation

In this paper, we show that automatically-generated questions and answers can be used to evaluate the quality of Machine Translation (MT) systems. Building on recent work on the evaluation of abstractive text summarization, we propose a new metric for system-level MT evaluation, compare it with other state-of-the-art solutions, and show its robustness by conducting experiments for various MT directions.

pdf bib abs
MTEQA at WMT21 Metrics Shared Task
Mateusz Krubiński | Erfan Ghadery | Marie-Francine Moens | Pavel Pecina
Proceedings of the Sixth Conference on Machine Translation

In this paper, we describe our submission to the WMT 2021 Metrics Shared Task. We use the automatically-generated questions and answers to evaluate the quality of Machine Translation (MT) systems. Our submission builds upon the recently proposed MTEQA framework. Experiments on WMT20 evaluation datasets show that at the system-level the MTEQA metric achieves performance comparable with other state-of-the-art solutions, while considering only a certain amount of information from the whole translation.

2020

pdf bib abs
Are Scene Graphs Good Enough to Improve Image Captioning?
Victor Milewski | Marie-Francine Moens | Iacer Calixto
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Many top-performing image captioning models rely solely on object features computed with an object detection model to generate image descriptions. However, recent studies propose to directly use scene graphs to introduce information about object relations into captioning, hoping to better describe interactions between objects. In this work, we thoroughly investigate the use of scene graphs in image captioning. We empirically study whether using additional scene graph encoders can lead to better image descriptions and propose a conditional graph attention network (C-GAT), where the image captioning decoder state is used to condition the graph updates. Finally, we determine to what extent noise in the predicted scene graphs influence caption quality. Overall, we find no significant difference between models that use scene graph features and models that only use object detection features across different captioning metrics, which suggests that existing scene graph generation models are still too noisy to be useful in image captioning. Moreover, although the quality of predicted scene graphs is very low in general, when using high quality scene graphs we obtain gains of up to 3.3 CIDEr compared to a strong Bottom-Up Top-Down baseline.

In this paper, we publicly release an annotated corpus of 42 decisions of the European Court of Human Rights (ECHR). The corpus is annotated in terms of three types of clauses useful in argument mining: premise, conclusion, and non-argument parts of the text. Furthermore, relationships among the premises and conclusions are mapped. We present baselines for three tasks that lead from unstructured texts to structured arguments. The tasks are argument clause recognition, clause relation prediction, and premise/conclusion recognition. Despite a straightforward application of the bidirectional encoders from Transformers (BERT), we obtained very promising results F1 0.765 on argument recognition, 0.511 on relation prediction, and 0.859/0.628 on premise/conclusion recognition). The results suggest the usefulness of pre-trained language models based on deep neural network architectures in argument mining. Because of the simplicity of the baselines, there is ample space for improvement in future work based on the released corpus.

pdf bib abs
Autoregressive Reasoning over Chains of Facts with Transformers
Ruben Cartuyvels | Graham Spinks | Marie-Francine Moens
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposes an iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question and its answer. Combining multiple sources of evidence or facts for multi-hop reasoning becomes increasingly hard when the number of sources needed to make an inference grows. Our algorithm copes with this by decomposing the selection of facts from a corpus autoregressively, conditioning the next iteration on previously selected facts. This allows us to use a pairwise learning-to-rank loss. We validate our method on datasets of the TextGraphs 2019 and 2020 Shared Tasks for explanation regeneration. Existing work on this task either evaluates facts in isolation or artificially limits the possible chains of facts, thus limiting multi-hop inference. We demonstrate that our algorithm, when used with a pre-trained transformer model, outperforms the previous state-of-the-art in terms of precision, training time and inference efficiency.

pdf bib abs
Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks
Parisa Kordjamshidi | James Pustejovsky | Marie-Francine Moens
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Understating spatial semantics expressed in natural language can become highly complex in real-world applications. This includes applications of language grounding, navigation, visual question answering, and more generic human-machine interaction and dialogue systems. In many of such downstream tasks, explicit representation of spatial concepts and relationships can improve the capabilities of machine learning models in reasoning and deep language understanding. In this tutorial, we overview the cutting-edge research results and existing challenges related to spatial language understanding including semantic annotations, existing corpora, symbolic and sub-symbolic representations, qualitative spatial reasoning, spatial common sense, deep and structured learning models. We discuss the recent results on the above-mentioned applications –that need spatial language learning and reasoning – and highlight the research gaps and future directions.

pdf bib abs
Decoding Language Spatial Relations to 2D Spatial Arrangements
Gorjan Radevski | Guillem Collell | Marie-Francine Moens | Tinne Tuytelaars
Findings of the Association for Computational Linguistics: EMNLP 2020

We address the problem of multimodal spatial understanding by decoding a set of language-expressed spatial relations to a set of 2D spatial arrangements in a multi-object and multi-relationship setting. We frame the task as arranging a scene of clip-arts given a textual description. We propose a simple and effective model architecture Spatial-Reasoning Bert (SR-Bert), trained to decode text to 2D spatial arrangements in a non-autoregressive manner. SR-Bert can decode both explicit and implicit language to 2D spatial arrangements, generalizes to out-of-sample data to a reasonable extent and can generate complete abstract scenes if paired with a clip-arts predictor. Finally, we qualitatively evaluate our method with a user study, validating that our generated spatial arrangements align with human expectation.

pdf bib
Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Aditya Mogadala | Sandro Pezzelle | Dietrich Klakow | Marie-Francine Moens | Zeynep Akata
Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

pdf bib abs
LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification
Erfan Ghadery | Marie-Francine Moens
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our system entitled ‘LIIR’ for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations.

pdf bib
Proceedings of the Third International Workshop on Spatial Language Understanding
Parisa Kordjamshidi | Archna Bhatia | Malihe Alikhani | Jason Baldridge | Mohit Bansal | Marie-Francine Moens
Proceedings of the Third International Workshop on Spatial Language Understanding

2019

pdf bib abs
Talk2Car: Taking Control of Your Self-Driving Car
Thierry Deruyttere | Simon Vandenhende | Dusan Grujicic | Luc Van Gool | Marie-Francine Moens
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

A long-term goal of artificial intelligence is to have an agent execute commands communicated through natural language. In many cases the commands are grounded in a visual environment shared by the human who gives the command and the agent. Execution of the command then requires mapping the command into the physical visual space, after which the appropriate action can be taken. In this paper we consider the former. Or more specifically, we consider the problem in an autonomous driving setting, where a passenger requests an action that can be associated with an object found in a street scene. Our work presents the Talk2Car dataset, which is the first object referral dataset that contains commands written in natural language for self-driving cars. We provide a detailed comparison with related datasets such as ReferIt, RefCOCO, RefCOCO+, RefCOCOg, Cityscape-Ref and CLEVR-Ref. Additionally, we include a performance analysis using strong state-of-the-art models. The results show that the proposed object referral task is a challenging one for which the models show promising results but still require additional research in natural language processing, computer vision and the intersection of these fields. The dataset can be found on our website: http://macchina-ai.eu/

pdf bib
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Aditya Mogadala | Dietrich Klakow | Sandro Pezzelle | Marie-Francine Moens
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

pdf bib abs
Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs
Geert Heyman | Bregt Verreet | Ivan Vulić | Marie-Francine Moens
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent research has discovered that a shared bilingual word embedding space can be induced by projecting monolingual word embedding spaces from two languages using a self-learning paradigm without any bilingual supervision. However, it has also been shown that for distant language pairs such fully unsupervised self-learning methods are unstable and often get stuck in poor local optima due to reduced isomorphism between starting monolingual spaces. In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues. We learn a shared multilingual embedding space for a variable number of languages by incrementally adding new languages one by one to the current multilingual space. Through the gradual language addition the method can leverage the interdependencies between the new language and all other languages in the current multilingual space. We find that it is beneficial to project more distant languages later in the iterative process. Our fully unsupervised multilingual embedding spaces yield results that are on par with the state-of-the-art methods in the bilingual lexicon induction (BLI) task, and simultaneously obtain state-of-the-art scores on two downstream tasks: multilingual document classification and multilingual dependency parsing, outperforming even supervised baselines. This finding also accentuates the need to establish evaluation protocols for cross-lingual word embeddings beyond the omnipresent intrinsic BLI task in future work.

2018

We present the highlights of the now finished 4-year SCATE project. It was completed in February 2018 and funded by the Flemish Government IWT-SBO, project No. 130041.1

pdf bib abs
Word-Level Loss Extensions for Neural Temporal Relation Classification
Artuur Leeuwenberg | Marie-Francine Moens
Proceedings of the 27th International Conference on Computational Linguistics

Unsupervised pre-trained word embeddings are used effectively for many tasks in natural language processing to leverage unlabeled textual data. Often these embeddings are either used as initializations or as fixed word representations for task-specific classification models. In this work, we extend our classification model’s task loss with an unsupervised auxiliary loss on the word-embedding level of the model. This is to ensure that the learned word representations contain both task-specific features, learned from the supervised loss component, and more general features learned from the unsupervised loss component. We evaluate our approach on the task of temporal relation extraction, in particular, narrative containment relation extraction from clinical records, and show that continued training of the embeddings on the unsupervised objective together with the task objective gives better task-specific embeddings, and results in an improvement over the state of the art on the THYME dataset, using only a general-domain part-of-speech tagger as linguistic resource.

pdf bib abs
A Flexible and Easy-to-use Semantic Role Labeling Framework for Different Languages
Quynh Ngoc Thi Do | Artuur Leeuwenberg | Geert Heyman | Marie-Francine Moens
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

This paper presents a flexible and open source framework for deep semantic role labeling. We aim at facilitating easy exploration of model structures for multiple languages with different characteristics. It provides flexibility in its model construction in terms of word representation, sequence representation, output modeling, and inference styles and comes with clear output visualization. The framework is available under the Apache 2.0 license.

pdf bib abs
Temporal Information Extraction by Predicting Relative Time-lines
Artuur Leeuwenberg | Marie-Francine Moens
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The current leading paradigm for temporal information extraction from text consists of three phases: (1) recognition of events and temporal expressions, (2) recognition of temporal relations among them, and (3) time-line construction from the temporal relations. In contrast to the first two phases, the last phase, time-line construction, received little attention and is the focus of this work. In this paper, we propose a new method to construct a linear time-line from a set of (extracted) temporal relations. But more importantly, we propose a novel paradigm in which we directly predict start and end-points for events from the text, constituting a time-line without going through the intermediate step of prediction of temporal relations as in earlier work. Within this paradigm, we propose two models that predict in linear complexity, and a new training loss using TimeML-style annotations, yielding promising results.

pdf bib abs
Generating Continuous Representations of Medical Texts
Graham Spinks | Marie-Francine Moens
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present an architecture that generates medical texts while learning an informative, continuous representation with discriminative features. During training the input to the system is a dataset of captions for medical X-Rays. The acquired continuous representations are of particular interest for use in many machine learning techniques where the discrete and high-dimensional nature of textual input is an obstacle. We use an Adversarially Regularized Autoencoder to create realistic text in both an unconditional and conditional setting. We show that this technique is applicable to medical texts which often contain syntactic and domain-specific shorthands. A quantitative evaluation shows that we achieve a lower model perplexity than a traditional LSTM generator.

pdf bib abs
Do Neural Network Cross-Modal Mappings Really Bridge Modalities?
Guillem Collell | Marie-Francine Moens
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Feed-forward networks are widely used in cross-modal applications to bridge modalities by mapping distributed vectors of one modality to the other, or to a shared space. The predicted vectors are then used to perform e.g., retrieval or labeling. Thus, the success of the whole system relies on the ability of the mapping to make the neighborhood structure (i.e., the pairwise similarities) of the predicted vectors akin to that of the target vectors. However, whether this is achieved has not been investigated yet. Here, we propose a new similarity measure and two ad hoc experiments to shed light on this issue. In three cross-modal benchmarks we learn a large number of language-to-vision and vision-to-language neural network mappings (up to five layers) using a rich diversity of image and text features and loss functions. Our results reveal that, surprisingly, the neighborhood structure of the predicted vectors consistently resembles more that of the input vectors than that of the target vectors. In a second experiment, we further show that untrained nets do not significantly disrupt the neighborhood (i.e., semantic) structure of the input vectors.

pdf bib abs
Learning Representations Specialized in Spatial Knowledge: Leveraging Language and Vision
Guillem Collell | Marie-Francine Moens
Transactions of the Association for Computational Linguistics, Volume 6

Spatial understanding is crucial in many real-world problems, yet little progress has been made towards building representations that capture spatial knowledge. Here, we move one step forward in this direction and learn such representations by leveraging a task consisting in predicting continuous 2D spatial arrangements of objects given object-relationship-object instances (e.g., “cat under chair”) and a simple neural network model that learns the task from annotated images. We show that the model succeeds in this task and, furthermore, that it is capable of predicting correct spatial arrangements for unseen objects if either CNN features or word embeddings of the objects are provided. The differences between visual and linguistic features are discussed. Next, to evaluate the spatial representations learned in the previous task, we introduce a task and a dataset consisting in a set of crowdsourced human ratings of spatial similarity for object pairs. We find that both CNN (convolutional neural network) features and word embeddings predict human judgments of similarity well and that these vectors can be further specialized in spatial knowledge if we update them when training the model that predicts spatial arrangements of objects. Overall, this paper paves the way towards building distributed spatial representations, contributing to the understanding of spatial expressions in language.

pdf bib
Proceedings of the First International Workshop on Spatial Language Understanding
Parisa Kordjamshidi | Archna Bhatia | James Pustejovsky | Marie-Francine Moens
Proceedings of the First International Workshop on Spatial Language Understanding

pdf bib abs
Evaluating Textual Representations through Image Generation
Graham Spinks | Marie-Francine Moens
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We present a methodology for determining the quality of textual representations through the ability to generate images from them. Continuous representations of textual input are ubiquitous in modern Natural Language Processing techniques either at the core of machine learning algorithms or as the by-product at any given layer of a neural network. While current techniques to evaluate such representations focus on their performance on particular tasks, they don’t provide a clear understanding of the level of informational detail that is stored within them, especially their ability to represent spatial information. The central premise of this paper is that visual inspection or analysis is the most convenient method to quickly and accurately determine information content. Through the use of text-to-image neural networks, we propose a new technique to compare the quality of textual representations by visualizing their information content. The method is illustrated on a medical dataset where the correct representation of spatial information and shorthands are of particular importance. For four different well-known textual representations, we show with a quantitative analysis that some representations are consistently able to deliver higher quality visualizations of the information content. Additionally, we show that the quantitative analysis technique correlates with the judgment of a human expert evaluator in terms of alignment.

2017

pdf bib abs
Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations
Geert Heyman | Ivan Vulić | Marie-Francine Moens
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We study the problem of bilingual lexicon induction (BLI) in a setting where some translation resources are available, but unknown translations are sought for certain, possibly domain-specific terminology. We frame BLI as a classification problem for which we design a neural network based classification architecture composed of recurrent long short-term memory and deep feed forward networks. The results show that word- and character-level representations each improve state-of-the-art results for BLI, and the best results are obtained by exploiting the synergy between these word- and character-level representations in the classification model.

pdf bib abs
Structured Learning for Temporal Relation Extraction from Clinical Records
Artuur Leeuwenberg | Marie-Francine Moens
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We propose a scalable structured learning model that jointly predicts temporal relations between events and temporal expressions (TLINKS), and the relation between these events and the document creation time (DCTR). We employ a structured perceptron, together with integer linear programming constraints for document-level inference during training and prediction to exploit relational properties of temporality, together with global learning of the relations at the document level. Moreover, this study gives insights in the results of integrating constraints for temporal relation extraction when using structured learning and prediction. Our best system outperforms the state-of-the art on both the CONTAINS TLINK task, and the DCTR task.

pdf bib abs
Improving Implicit Semantic Role Labeling by Predicting Semantic Frame Arguments
Quynh Ngoc Thi Do | Steven Bethard | Marie-Francine Moens
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Implicit semantic role labeling (iSRL) is the task of predicting the semantic roles of a predicate that do not appear as explicit arguments, but rather regard common sense knowledge or are mentioned earlier in the discourse. We introduce an approach to iSRL based on a predictive recurrent neural semantic frame model (PRNSFM) that uses a large unannotated corpus to learn the probability of a sequence of semantic arguments given a predicate. We leverage the sequence probabilities predicted by the PRNSFM to estimate selectional preferences for predicates and their arguments. On the NomBank iSRL test set, our approach improves state-of-the-art performance on implicit semantic role labeling with less reliance than prior work on manually constructed language resources.

pdf bib abs
KULeuven-LIIR at SemEval-2017 Task 12: Cross-Domain Temporal Information Extraction from Clinical Records
Artuur Leeuwenberg | Marie-Francine Moens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe the system of the KULeuven-LIIR submission for Clinical TempEval 2017. We participated in all six subtasks, using a combination of Support Vector Machines (SVM) for event and temporal expression detection, and a structured perceptron for extracting temporal relations. Moreover, we present and analyze the results from our submissions, and verify the effectiveness of several system components. Our system performed above average for all subtasks in both phases.

pdf bib abs
Learning to Recognize Animals by Watching Documentaries: Using Subtitles as Weak Supervision
Aparna Nurani Venkitasubramanian | Tinne Tuytelaars | Marie-Francine Moens
Proceedings of the Sixth Workshop on Vision and Language

We investigate animal recognition models learned from wildlife video documentaries by using the weak supervision of the textual subtitles. This is a particularly challenging setting, since i) the animals occur in their natural habitat and are often largely occluded and ii) subtitles are to a large degree complementary to the visual content, providing a very weak supervisory signal. This is in contrast to most work on integrated vision and language in the literature, where textual descriptions are tightly linked to the image content, and often generated in a curated fashion for the task at hand. In particular, we investigate different image representations and models, including a support vector machine on top of activations of a pretrained convolutional neural network, as well as a Naive Bayes framework on a ‘bag-of-activations’ image representation, where each element of the bag is considered separately. This representation allows key components in the image to be isolated, in spite of largely varying backgrounds and image clutter, without an object detection or image segmentation step. The methods are evaluated based on how well they transfer to unseen camera-trap images captured across diverse topographical regions under different environmental conditions and illumination settings, involving a large domain shift.

2016

pdf bib abs
Facing the most difficult case of Semantic Role Labeling: A collaboration of word embeddings and co-training
Quynh Ngoc Thi Do | Steven Bethard | Marie-Francine Moens
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a successful collaboration of word embeddings and co-training to tackle in the most difficult test case of semantic role labeling: predicting out-of-domain and unseen semantic frames. Despite the fact that co-training is a successful traditional semi-supervised method, its application in SRL is very limited especially when a huge amount of labeled data is available. In this work, co-training is used together with word embeddings to improve the performance of a system trained on a large training dataset. We also introduce a semantic role labeling system with a simple learning architecture and effective inference that is easily adaptable to semi-supervised settings with new training data and/or new features. On the out-of-domain testing set of the standard benchmark CoNLL 2009 data our simple approach achieves high performance and improves state-of-the-art results.

pdf bib abs
Is an Image Worth More than a Thousand Words? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations
Guillem Collell | Marie-Francine Moens
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Human concept representations are often grounded with visual information, yet some aspects of meaning cannot be visually represented or are better described with language. Thus, vision and language provide complementary information that, properly combined, can potentially yield more complete concept representations. Recently, state-of-the-art distributional semantic models and convolutional neural networks have achieved great success in representing linguistic and visual knowledge respectively. In this paper, we compare both, visual and linguistic representations in their ability to capture different types of fine-grain semantic knowledge—or attributes—of concepts. Humans often describe objects using attributes, that is, properties such as shape, color or functionality, which often transcend the linguistic and visual modalities. In our setting, we evaluate how well attributes can be predicted by using the unimodal representations as inputs. We are interested in first, finding out whether attributes are generally better captured by either the vision or by the language modality; and second, if none of them is clearly superior (as we hypothesize), what type of attributes or semantic knowledge are better encoded from each modality. Ultimately, our study sheds light on the potential of combining visual and textual representations.

pdf bib abs
Semi-automatically Alignment of Predicates between Speech and OntoNotes data
Niraj Shrestha | Marie-Francine Moens
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Speech data currently receives a growing attention and is an important source of information. We still lack suitable corpora of transcribed speech annotated with semantic roles that can be used for semantic role labeling (SRL), which is not the case for written data. Semantic role labeling in speech data is a challenging and complex task due to the lack of sentence boundaries and the many transcription errors such as insertion, deletion and misspellings of words. In written data, SRL evaluation is performed at the sentence level, but in speech data sentence boundaries identification is still a bottleneck which makes evaluation more complex. In this work, we semi-automatically align the predicates found in transcribed speech obtained with an automatic speech recognizer (ASR) with the predicates found in the corresponding written documents of the OntoNotes corpus and manually align the semantic roles of these predicates thus obtaining annotated semantic frames in the speech data. This data can serve as gold standard alignments for future research in semantic role labeling of speech data.

pdf bib
Multi-Modal Representations for Improved Bilingual Lexicon Learning
Ivan Vulić | Douwe Kiela | Stephen Clark | Marie-Francine Moens
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
KULeuven-LIIR at SemEval 2016 Task 12: Detecting Narrative Containment in Clinical Records
Artuur Leeuwenberg | Marie-Francine Moens
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib abs
A Dataset for Multimodal Question Answering in the Cultural Heritage Domain
Shurong Sheng | Luc Van Gool | Marie-Francine Moens
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

Multimodal question answering in the cultural heritage domain allows visitors to ask questions in a more natural way and thus provides better user experiences with cultural objects while visiting a museum, landmark or any other historical site. In this paper, we introduce the construction of a golden standard dataset that will aid research of multimodal question answering in the cultural heritage domain. The dataset, which will be soon released to the public, contains multimodal content including images of typical artworks from the fascinating old-Egyptian Amarna period, related image-containing documents of the artworks and over 800 multimodal queries integrating visual and textual questions. The multimodal questions and related documents are all in English. The multimodal questions are linked to relevant paragraphs in the related documents that contain the answer to the multimodal query.

pdf bib
Visualizing the Content of a Children’s Story in a Virtual World: Lessons Learned
Quynh Ngoc Thi Do | Steven Bethard | Marie-Francine Moens
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2015

pdf bib
Adapting Coreference Resolution for Narrative Processing
Quynh Ngoc Thi Do | Steven Bethard | Marie-Francine Moens
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction
Ivan Vulić | Marie-Francine Moens
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Information Extraction from Biomedical Texts: Learning Models with Limited Supervision
Marie-Francine Moens
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data
Ivan Vulić | Marie-Francine Moens
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib abs
HiEve: A Corpus for Extracting Event Hierarchies from News Stories
Goran Glavaš | Jan Šnajder | Marie-Francine Moens | Parisa Kordjamshidi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In news stories, event mentions denote real-world events of different spatial and temporal granularity. Narratives in news stories typically describe some real-world event of coarse spatial and temporal granularity along with its subevents. In this work, we present HiEve, a corpus for recognizing relations of spatiotemporal containment between events. In HiEve, the narratives are represented as hierarchies of events based on relations of spatiotemporal containment (i.e., superevent―subevent relations). We describe the process of manual annotation of HiEve. Furthermore, we build a supervised classifier for recognizing spatiotemporal containment between events to serve as a baseline for future research. Preliminary experimental results are encouraging, with classifier performance reaching 58% F1-score, only 11% less than the inter annotator agreement.

pdf bib
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)
Oleksandr Kolomiyets | Marie-Francine Moens | Martha Palmer | James Pustejovsky | Steven Bethard
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

pdf bib
Key Event Detection in Video using ASR and Visual Data
Niraj Shrestha | Aparna N. Venkitasubramanian | Marie-Francine Moens
Proceedings of the Third Workshop on Vision and Language

2013

pdf bib
A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
Ivan Vulić | Marie-Francine Moens
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-Lingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses
Ivan Vulić | Marie-Francine Moens
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
KUL: Data-driven Approach to Temporal Parsing of Newswire Articles
Oleksandr Kolomiyets | Marie-Francine Moens
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
SemEval-2013 Task 3: Spatial Role Labeling
Oleksandr Kolomiyets | Parisa Kordjamshidi | Marie-Francine Moens | Steven Bethard
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Detecting Relations in the Gene Regulation Network
Thomas Provoost | Marie-Francine Moens
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
Sub-corpora Sampling with an Application to Bilingual Lexicon Extraction
Ivan Vulić | Marie-Francine Moens
Proceedings of COLING 2012

pdf bib
Coreference Clustering using Column Generation
Jan De Belder | Marie-Francine Moens
Proceedings of COLING 2012: Posters

pdf bib
Skip N-grams and Ranking Functions for Predicting Script Events
Bram Jans | Steven Bethard | Ivan Vulić | Marie Francine Moens
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge
Ivan Vulić | Marie-Francine Moens
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs
Annotating Story Timelines as Temporal Dependency Structures
Steven Bethard | Oleksandr Kolomiyets | Marie-Francine Moens
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an approach to annotating timelines in stories where events are linked together by temporal relations into a temporal dependency tree. This approach avoids the disconnected timeline problems of prior work, and results in timelines that are more suitable for temporal reasoning. We show that annotating timelines as temporal dependency trees is possible with high levels of inter-annotator agreement - Krippendorff's Alpha of 0.822 on selecting event pairs, and of 0.700 on selecting temporal relation labels - even with the moderately sized relation set of BEFORE, AFTER, INCLUDES, IS-INCLUDED, IDENTITY and OVERLAP. We also compare several annotation schemes for identifying story events, and show that higher inter-annotator agreement can be reached by focusing on only the events that are essential to forming the timeline, skipping words in negated contexts, modal contexts and quoted speech.

pdf bib
Extracting Narrative Timelines as Temporal Dependency Structures
Oleksandr Kolomiyets | Steven Bethard | Marie-Francine Moens
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
SemEval-2012 Task 3: Spatial Role Labeling
Parisa Kordjamshidi | Steven Bethard | Marie-Francine Moens
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
KU Leuven at HOO-2012: A Hybrid Approach to Detection and Correction of Determiner and Preposition Errors in Non-native English Text
Li Quan | Oleksandr Kolomiyets | Marie-Francine Moens
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

pdf bib
Model-Portability Experiments for Textual Temporal Analysis
Oleksandr Kolomiyets | Steven Bethard | Marie-Francine Moens
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Identifying Word Translations from Comparable Corpora Using Latent Topic Models
Ivan Vulić | Wim De Smet | Marie-Francine Moens
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib abs
Spatial Role Labeling: Task Definition and Annotation Scheme
Parisa Kordjamshidi | Martijn Van Otterlo | Marie-Francine Moens
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

One of the essential functions of natural language is to talk about spatial relationships between objects. Linguistic constructs can express highly complex, relational structures of objects, spatial relations between them, and patterns of motion through spaces relative to some reference point. Learning how to map this information onto a formal representation from a text is a challenging problem. At present no well-defined framework for automatic spatial information extraction exists that can handle all of these issues. In this paper we introduce the task of spatial role labeling and propose an annotation scheme that is language-independent and facilitates the application of machine learning techniques. Our framework consists of a set of spatial roles based on the theory of holistic spatial semantics with the intent of covering all aspects of spatial concepts, including both static and dynamic spatial relations. We illustrate our annotation scheme with many examples throughout the paper, and in addition we highlight how to connect to spatial calculi such as region connection calculus and also how our approach fits into related work.

pdf bib
KUL: Recognition and Normalization of Temporal Expressions
Oleksandr Kolomiyets | Marie-Francine Moens
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Semi-supervised Semantic Role Labeling Using the Latent Words Language Model
Koen Deschacht | Marie-Francine Moens
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Meeting TempEval-2: Shallow Approach for Temporal Tagger
Oleksandr Kolomiyets | Marie-Francine Moens
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2009)
Patrick Saint-Dizier | Marie-Francine Moens
Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2009)

2008

pdf bib abs
Language Resources for Studying Argument
Chris Reed | Raquel Mochales Palau | Glenn Rowe | Marie-Francine Moens
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the development of a written corpus of argumentative reasoning. Arguments in the corpus have been analysed using state of the art techniques from argumentation theory and have been marked up using an open, reusable markup language. A number of the key challenges enountered during the process are explored, and preliminary observations about features such as inter-coder reliability and corpus statistics are discussed. In addition, several examples are offered of how this kind of language resource can be used in linguistic, computational and philosophical research, and in particular, how the corpus has been used to initiate a programme investigating the automatic detection of argumentative structure.

pdf bib
Coling 2008: Proceedings of the workshop on Knowledge and Reasoning for Answering Questions
Marie-Francine Moens | Patrick Saint-Dizier
Coling 2008: Proceedings of the workshop on Knowledge and Reasoning for Answering Questions