Pavlos Vougiouklis


2024

pdf bib
A Usage-centric Take on Intent Understanding in E-Commerce
Wendi Zhou | Tianyi Li | Pavlos Vougiouklis | Mark Steedman | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its essential role in product recommendation and business user profiling analysis, intent understanding has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative user intents as “how a customer uses a product”, and pose intent understanding as a natural language reasoning task, independent of product ontologies. We identify two weaknesses of FolkScope, the SOTA E-Commerce Intent Knowledge Graph: category-rigidity and property-ambiguity. They limit its ability to strongly align user intents with products having the most desirable property, and to recommend useful products across diverse categories. Following these observations, we introduce a Product Recovery Benchmark featuring a novel evaluation framework and an example dataset. We further validate the above FolkScope weaknesses on this benchmark. Our code and dataset are available at https://github.com/stayones/Usgae-Centric-Intent-Understanding.

pdf bib
Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning
Zhili Shen | Pavlos Vougiouklis | Chenxin Diao | Kaustubh Vyas | Yuanyi Ji | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA
Wenyu Huang | Guancheng Zhou | Hongru Wang | Pavlos Vougiouklis | Mirella Lapata | Jeff Z. Pan
Findings of the Association for Computational Linguistics: EMNLP 2024

Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable external knowledge for LLMs. Retrieving information from KGs differs from extracting it from document sets. Most existing approaches seek to directly retrieve relevant subgraphs, thereby eliminating the need for extensive SPARQL annotations, traditionally required by semantic parsing methods. In this paper, we model the subgraph retrieval task as a conditional generation task handled by small language models. Specifically, we define a subgraph identifier as a sequence of relations, each represented as a special token stored in the language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, achieves competitive retrieval performance compared to state-of-the-art models relying on 7B parameters, demonstrating that small language models are capable of performing the subgraph retrieval task. Furthermore, our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks. Our model and data will be made available online: https://github.com/hwy9855/GSR.

2023

pdf bib
FastRAT: Fast and Efficient Cross-lingual Text-to-SQL Semantic Parsing
Pavlos Vougiouklis | Nikos Papasarantopoulos | Danna Zheng | David Tuckey | Chenxin Diao | Zhili Shen | Jeff Pan
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Retrieval Augmented Generation with Rich Answer Encoding
Wenyu Huang | Mirella Lapata | Pavlos Vougiouklis | Nikos Papasarantopoulos | Jeff Pan
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2018

pdf bib
Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Lucie-Aimée Kaffee | Hady Elsahar | Pavlos Vougiouklis | Christophe Gravier | Frédérique Laforest | Jonathon Hare | Elena Simperl
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

pdf bib
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Hady Elsahar | Pavlos Vougiouklis | Arslen Remaci | Christophe Gravier | Jonathon Hare | Frederique Laforest | Elena Simperl
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Aligning Texts and Knowledge Bases with Semantic Sentence Simplification
Yassine Mrabet | Pavlos Vougiouklis | Halil Kilicoglu | Claire Gardent | Dina Demner-Fushman | Jonathon Hare | Elena Simperl
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

pdf bib
A Neural Network Approach for Knowledge-Driven Response Generation
Pavlos Vougiouklis | Jonathon Hare | Elena Simperl
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a novel response generation system. The system assumes the hypothesis that participants in a conversation base their response not only on previous dialog utterances but also on their background knowledge. Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space. We create a dataset of aligned Wikipedia sentences and sequences of Reddit utterances, which we we use to train our model. Given a sequence of past utterances and a set of sentences that represent the background knowledge, our end-to-end learnable model is able to generate context-sensitive and knowledge-driven responses by leveraging the alignment of two different data sources. Our approach achieves up to 55% improvement in perplexity compared to purely sequential models based on RNNs that are trained only on sequences of utterances.