Samuel Mensah

2026

Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to 71.75% macro-F1, outperforming prompt-based approaches (+1.99, +6.24) and existing RL methods (+5.84). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller language models to achieve state-of-the-art claim verification by jointly optimising for verification accuracy and decomposition.

pdf bib abs

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images
Mathieu Sibue | Andrés Muñoz Garza | Samuel Mensah | Pranav Shetty | Zhiqiang Ma | Xiaomo Liu | Manuela Veloso
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Enterprise documents, such as forms and reports, embed critical information for downstream applications like data archiving, automated workflows, and analytics. Although generalist Vision Language Models (VLMs) perform well on established document understanding benchmarks, their ability to conduct holistic, fine-grained structured extraction across diverse document types and flexible schemas is not well studied. Existing Key Entity Extraction (KEE), Relation Extraction (RE), and Visual Question Answering (VQA) datasets are limited by narrow entity ontologies, simple queries, or homogeneous document types, often overlooking the need for adaptable and structured extraction. To address these gaps, we introduce ExStrucTiny, a new benchmark dataset for structured Information Extraction (IE) from document images, unifying aspects of KEE, RE, and VQA. Built through a novel pipeline combining manual and synthetic human-validated samples, ExStrucTiny covers more varied document types and extraction scenarios. We analyze open and closed VLMs on this benchmark, highlighting challenges such as schema adaptation, query under-specification, and answer localization. We hope our work provides a bedrock for improving generalist models for structured IE in documents.

2025

pdf bib abs

A Variational Approach for Mitigating Entity Bias in Relation Extraction
Samuel Mensah | Elena Kochkina | Jabez Magomere | Joy Prakash Sain | Simerjot Kaur | Charese Smiley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Mitigating entity bias is a critical challenge in Relation Extraction (RE), where models often rely excessively on entities, resulting in poor generalization. This paper presents a novel approach to address this issue by adapting a Variational Information Bottleneck (VIB) framework. Our method compresses entity-specific information while preserving task-relevant features. It achieves state-of-the-art performance on both general and financial domain RE datasets, excelling in in-domain settings (original test sets) and out-of-domain (modified test sets with type-constrained entity replacements). Our approach offers a robust, interpretable, and theoretically grounded methodology.

pdf bib abs

FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Jabez Magomere | Elena Kochkina | Samuel Mensah | Simerjot Kaur | Charese Smiley
Findings of the Association for Computational Linguistics: NAACL 2025

We introduce FinNLI, a benchmark dataset for Financial Natural Language Inference (FinNLI) across diverse financial texts like SEC Filings, Annual Reports, and Earnings Call transcripts. Our dataset framework ensures diverse premise-hypothesis pairs while minimizing spurious correlations. FinNLI comprises 21,304 pairs, including a high-quality test set of 3,304 instances annotated by finance experts. Evaluations show that domain shift significantly degrades general-domain NLI performance. The highest Macro F1 scores for pre-trained (PLMs) and large language models (LLMs) baselines are 74.57% and 78.62%, respectively, highlighting the dataset’s difficulty. Surprisingly, instruction-tuned financial LLMs perform poorly, suggesting limited generalizability. FinNLI exposes weaknesses in current LLMs for financial reasoning, indicating room for improvement.

pdf bib abs

Understanding and effectively responding to email communication remains a critical yet complex challenge for current AI techniques, especially in corporate environments. These tasks are further complicated by the need for domain-specific knowledge, accurate entity recognition, and high precision to prevent costly errors. While recent advances in AI, specifically Large Language Models (LLMs), have made strides in natural language understanding, they often lack business-specific expertise required in such settings. In this work, we present Advanced Messaging Platform (AMP), a production-grade AI pipeline that automates email response generation at scale in real-world enterprise settings. AMP has been in production for more than a year, processing thousands of emails daily while maintaining high accuracy and adaptability to evolving business needs.

2023

pdf bib abs

Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement
Samuel Mensah | Kai Sun | Nikolaos Aletras
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective at representing rare words or words with insufficient context information. To address this issue, this work trades syntax trees for BERT wordpieces by entirely removing the GCN component from the methods’ architectures. To enhance TOWE performance, we tackle the issue of aspect representation loss during encoding. Instead of solely utilizing a sentence as the input, we use a sentence-aspect pair. Our relatively simple approach achieves state-of-the-art results on benchmark datasets and should serve as a strong baseline for further research.

2022

pdf bib abs

Contrastive Learning with Expectation-Maximization for Weakly Supervised Phrase Grounding
Keqin Chen | Richong Zhang | Samuel Mensah | Yongyi Mao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Weakly supervised phrase grounding aims to learn an alignment between phrases in a caption and objects in a corresponding image using only caption-image annotations, i.e., without phrase-object annotations. Previous methods typically use a caption-image contrastive loss to indirectly supervise the alignment between phrases and objects, which hinders the maximum use of the intrinsic structure of the multimodal data and leads to unsatisfactory performance. In this work, we directly use the phrase-object contrastive loss in the condition that no positive annotation is available in the first place. Specifically, we propose a novel contrastive learning framework based on the expectation-maximization algorithm that adaptively refines the target prediction. Experiments on two widely used benchmarks, Flickr30K Entities and RefCOCO+, demonstrate the effectiveness of our framework. We obtain 63.05% top-1 accuracy on Flickr30K Entities and 59.51%/43.46% on RefCOCO+ TestA/TestB, outperforming the previous methods by a large margin, even surpassing a previous SoTA that uses a pre-trained vision-language model. Furthermore, we deliver a theoretical analysis of the effectiveness of our method from the perspective of the maximum likelihood estimate with latent variables.

pdf bib abs

A Transformational Biencoder with In-Domain Negative Sampling for Zero-Shot Entity Linking
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Findings of the Association for Computational Linguistics: ACL 2022

Recent interest in entity linking has focused in the zero-shot scenario, where at test time the entity mention to be labelled is never seen during training, or may belong to a different domain from the source domain. Current work leverage pre-trained BERT with the implicit assumption that it bridges the gap between the source and target domain distributions. However, fine-tuned BERT has a considerable underperformance at zero-shot when applied in a different domain. We solve this problem by proposing a Transformational Biencoder that incorporates a transformation into BERT to perform a zero-shot transfer from the source domain during training. As like previous work, we rely on negative entities to encourage our model to discriminate the golden entities during training. To generate these negative entities, we propose a simple but effective strategy that takes the domain of the golden entity into perspective. Our experimental results on the benchmark dataset Zeshel show effectiveness of our approach and achieve new state-of-the-art.

pdf bib abs

Knowledge graphs typically contain a large number of entities but often cover only a fraction of all relations between them (i.e., incompleteness). Zero-shot link prediction (ZSLP) is a popular way to tackle the problem by automatically identifying unobserved relations between entities. Most recent approaches use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to improve the encoded representation. These methods lack robustness as they are bound to support only tokens from a fixed vocabulary and unable to model out-of-vocabulary (OOV) words. Subword units such as character n-grams have the capability of generating more expressive representations for OOV words. Hence, in this paper, we propose a Hierarchical N-gram framework for Zero-Shot Link Prediction (HNZSLP) that leverages character n-gram information for ZSLP. Our approach works by first constructing a hierarchical n-gram graph from the surface name of relations. Subsequently, a new Transformer-based network models the hierarchical n-gram graph to learn a relation embedding for ZSLP. Experimental results show that our proposed HNZSLP method achieves state-of-the-art performance on two standard ZSLP datasets.

pdf bib abs

DropMix: A Textual Data Augmentation Combining Dropout with Mixup
Fanshuang Kong | Richong Zhang | Xiaohui Guo | Samuel Mensah | Yongyi Mao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Overfitting is a notorious problem when there is insufficient data to train deep neural networks in machine learning tasks. Data augmentation regularization methods such as Dropout, Mixup, and their enhanced variants are effective and prevalent, and achieve promising performance to overcome overfitting. However, in text learning, most of the existing regularization approaches merely adopt ideas from computer vision without considering the importance of dimensionality in natural language processing. In this paper, we argue that the property is essential to overcome overfitting in text learning. Accordingly, we present a saliency map informed textual data augmentation and regularization framework, which combines Dropout and Mixup, namely DropMix, to mitigate the overfitting problem in text learning. In addition, we design a procedure that drops and patches fine grained shapes of the saliency map under the DropMix framework to enhance regularization. Empirical studies confirm the effectiveness of the proposed approach on 12 text classification tasks.

pdf bib abs

Explicit Role Interaction Network for Event Argument Extraction
Nan Ding | Chunming Hu | Kai Sun | Samuel Mensah | Richong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Event argument extraction is a challenging subtask of event extraction, aiming to identify and assign roles to arguments under a certain event. Existing methods extract arguments of each role independently, ignoring the relationship between different roles. Such an approach hinders the model from learning explicit interactions between different roles to improve the performance of individual argument extraction. As a solution, we design a neural model that we refer to as the Explicit Role Interaction Network (ERIN) which allows for dynamically capturing the correlations between different argument roles within an event. Extensive experiments on the benchmark dataset ACE2005 demonstrate the superiority of our proposed model to existing approaches.

2021

pdf bib abs

An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction
Samuel Mensah | Kai Sun | Nikolaos Aletras
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures.

2020

pdf bib abs

Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the task-specific networks for prediction. However, such an approach hinders the model from learning explicit interactions between the two tasks to improve the performance on the individual tasks. As a solution, we design a multi-task learning model which we refer to as recurrent interaction network which allows the learning of interactions dynamically, to effectively model task-specific features for classification. Empirical studies on two real-world datasets confirm the superiority of the proposed model.

2019

pdf bib abs

Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a method based on neural networks to identify the sentiment polarity of opinion words expressed on a specific aspect of a sentence. Although a large majority of works typically focus on leveraging the expressive power of neural networks in handling this task, we explore the possibility of integrating dependency trees with neural networks for representation learning. To this end, we present a convolution over a dependency tree (CDT) model which exploits a Bi-directional Long Short Term Memory (Bi-LSTM) to learn representations for features of a sentence, and further enhance the embeddings with a graph convolutional network (GCN) which operates directly on the dependency tree of the sentence. Our approach propagates both contextual and dependency information from opinion words to aspect words, offering discriminative properties for supervision. Experimental results ranks our approach as the new state-of-the-art in aspect-based sentiment classification.

Co-authors

Venues

Fix author