Pavlos Vougiouklis

2025

pdf bib abs
Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation
Wenyu Huang | Pavlos Vougiouklis | Mirella Lapata | Jeff Z. Pan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-hop Question Answering (MHQA) adds layers of complexity to question answering, making it more challenging. When Language Models (LMs) are prompted with multiple search results, they are tasked not only with retrieving relevant information but also employing multi-hop reasoning across the information sources. Although LMs perform well on traditional question-answering tasks, the causal mask can hinder their capacity to reason across complex contexts. In this paper, we explore how LMs respond to multi-hop questions by permuting search results (retrieved documents) under various configurations. Our study reveals interesting findings as follows: 1) Encoder-decoder models, such as the ones in the Flan-T5 family, generally outperform causal decoder-only LMs in MHQA tasks, despite being significantly smaller in size; 2) altering the order of gold documents reveals distinct trends in both Flan T5 models and fine-tuned decoder-only models, with optimal performance observed when the document order aligns with the reasoning chain order; 3) enhancing causal decoder-only models with bi-directional attention by modifying the causal mask can effectively boost their end performance. In addition to the above, we conduct a thorough investigation of the distribution of LM attention weights in the context of MHQA. Our experiments reveal that attention weights tend to peak at higher values when the resulting answer is correct. We leverage this finding to heuristically improve LMs’ performance on this task. Our code is publicly available at https://github.com/hwy9855/MultiHopQA-Reasoning.

pdf bib abs
Don’t Forget the Base Retriever! A Low-Resource Graph-based Retriever for Multi-hop Question Answering
Andre Melo | Enting Chen | Pavlos Vougiouklis | Chenxin Diao | Shriram Piramanayagam | Ruofei Lai | Jeff Z. Pan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Traditional Retrieval-augmented Generation systems struggle with complex multi-hop questions, which often require reasoning over multiple passages. While GraphRAG approaches address these challenges, most of them rely on expensive LLM calls. In this paper, we propose GR\small{IEVER}, a lightweight, low-resource, multi-step graph-based retriever for multi-hop QA. Unlike prior work, GR\small{IEVER} does not rely on LLMs and can perform multi-step retrieval in a few hundred milliseconds. It efficiently indexes passages alongside an associated knowledge graph and employs a hybrid retriever combined with aggressive filtering to reduce retrieval latency. Experiments on multi-hop QA datasets demonstrate that GR\small{IEVER} outperforms conventional retrievers and shows strong potential as a base retriever within multi-step agentic frameworks.

Retrieval-augmented Generation (RAG) relies on effective retrieval capabilities, yet traditional sparse and dense retrievers inherently struggle with multi-hop retrieval scenarios. In this paper, we introduce G\small{E}\normalsize{AR}, a system that advances RAG performance through two key innovations: (i) an efficient graph expansion mechanism that augments any conventional base retriever, such as BM25, and (ii) an agent framework that incorporates the resulting graph-based retrieval into a multi-step retrieval framework. Our evaluation demonstrates G\small{E}\normalsize{AR}‘s superior retrieval capabilities across three multi-hop question answering datasets. Notably, our system achieves state-of-the-art results with improvements exceeding 10% on the challenging MuSiQue dataset, while consuming fewer tokens and requiring fewer iterations than existing multi-step retrieval systems. The project page is available at https://gear-rag.github.io.

2024

pdf bib abs
A Usage-centric Take on Intent Understanding in E-Commerce
Wendi Zhou | Tianyi Li | Pavlos Vougiouklis | Mark Steedman | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its essential role in product recommendation and business user profiling analysis, intent understanding has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative user intents as “how a customer uses a product”, and pose intent understanding as a natural language reasoning task, independent of product ontologies. We identify two weaknesses of FolkScope, the SOTA E-Commerce Intent Knowledge Graph: category-rigidity and property-ambiguity. They limit its ability to strongly align user intents with products having the most desirable property, and to recommend useful products across diverse categories. Following these observations, we introduce a Product Recovery Benchmark featuring a novel evaluation framework and an example dataset. We further validate the above FolkScope weaknesses on this benchmark. Our code and dataset are available at https://github.com/stayones/Usgae-Centric-Intent-Understanding.

pdf bib
Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning
Zhili Shen | Pavlos Vougiouklis | Chenxin Diao | Kaustubh Vyas | Yuanyi Ji | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA
Wenyu Huang | Guancheng Zhou | Hongru Wang | Pavlos Vougiouklis | Mirella Lapata | Jeff Z. Pan
Findings of the Association for Computational Linguistics: EMNLP 2024

Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable external knowledge for LLMs. Retrieving information from KGs differs from extracting it from document sets. Most existing approaches seek to directly retrieve relevant subgraphs, thereby eliminating the need for extensive SPARQL annotations, traditionally required by semantic parsing methods. In this paper, we model the subgraph retrieval task as a conditional generation task handled by small language models. Specifically, we define a subgraph identifier as a sequence of relations, each represented as a special token stored in the language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, achieves competitive retrieval performance compared to state-of-the-art models relying on 7B parameters, demonstrating that small language models are capable of performing the subgraph retrieval task. Furthermore, our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks. Our model and data will be made available online: https://github.com/hwy9855/GSR.

2023

pdf bib
FastRAT: Fast and Efficient Cross-lingual Text-to-SQL Semantic Parsing
Pavlos Vougiouklis | Nikos Papasarantopoulos | Danna Zheng | David Tuckey | Chenxin Diao | Zhili Shen | Jeff Pan
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Retrieval Augmented Generation with Rich Answer Encoding
Wenyu Huang | Mirella Lapata | Pavlos Vougiouklis | Nikos Papasarantopoulos | Jeff Pan
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2018

pdf bib
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Hady Elsahar | Pavlos Vougiouklis | Arslen Remaci | Christophe Gravier | Jonathon Hare | Frederique Laforest | Elena Simperl
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Lucie-Aimée Kaffee | Hady Elsahar | Pavlos Vougiouklis | Christophe Gravier | Frédérique Laforest | Jonathon Hare | Elena Simperl
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

2016

pdf bib abs
A Neural Network Approach for Knowledge-Driven Response Generation
Pavlos Vougiouklis | Jonathon Hare | Elena Simperl
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a novel response generation system. The system assumes the hypothesis that participants in a conversation base their response not only on previous dialog utterances but also on their background knowledge. Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space. We create a dataset of aligned Wikipedia sentences and sequences of Reddit utterances, which we we use to train our model. Given a sequence of past utterances and a set of sentences that represent the background knowledge, our end-to-end learnable model is able to generate context-sensitive and knowledge-driven responses by leveraging the alignment of two different data sources. Our approach achieves up to 55% improvement in perplexity compared to purely sequential models based on RNNs that are trained only on sequences of utterances.

pdf bib
Aligning Texts and Knowledge Bases with Semantic Sentence Simplification
Yassine Mrabet | Pavlos Vougiouklis | Halil Kilicoglu | Claire Gardent | Dina Demner-Fushman | Jonathon Hare | Elena Simperl
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)