Quan Guo

2025

pdf bib abs
Transform Retrieval for Textual Entailment in RAG
Xin Liang | Quan Guo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

In this paper, we introduce Transform Retrieval, a novel approach aimed at improving Textual Entailment Retrieval within the framework of Retrieval-Augmented Generation (RAG). While RAG has shown promise in enhancing Large Language Models by retrieving relevant documents to extract specific knowledge or mitigate hallucination, current retrieval methods often prioritize relevance without ensuring the retrieved documents semantically support answering the queries. Transform Retrieval addresses this gap by transforming query embeddings to better align with semantic entailment without re-encoding the document corpus. We achieve this by using a transform model and employing a contrastive learning strategy to optimize the alignment between transformed query embeddings and document embeddings for better entailment.We evaluated the framework using BERT as frozen pre-trained encoder and compared it with a fully fine-tuned skyline model. Experimental results show that Transform Retrieval with simple MLP consistently approaches the skyline across multiple datasets, demonstrating the method’s effectiveness. The high performance on HotpotQA highlights its strength in many-to-many retrieval scenarios.

2024

pdf bib abs
NavHint: Vision and Language Navigation Agent with a Hint Generator
Yue Zhang | Quan Guo | Parisa Kordjamshidi
Findings of the Association for Computational Linguistics: EACL 2024

The existing work on vision and language navigation mainly relies on navigation-related losses to establish the connection between vision and language modalities, neglecting aspects of helping the navigation agent build a deep understanding of the visual environment.In our work, we provide indirect supervision to the navigation agent through a hint generator that provides detailed visual descriptions.The hint generator assists the navigation agent in developing a global understanding of the visual environment. It directs the agent’s attention toward related navigation details, including the relevant sub-instruction, potential challenges in recognition and ambiguities in grounding, and the targeted viewpoint description. To train the hint generator, we construct a synthetic dataset based on landmarks in the instructions and visible and distinctive objects in the visual environment.We evaluate our method on the R2R and R4R datasets and achieve state-of-the-art on several metrics. The experimental results demonstrate that generating hints not only enhances the navigation performance but also helps improve the agent’s interpretability.

2021

pdf bib abs
DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning
Hossein Rajaby Faghihi | Quan Guo | Andrzej Uszok | Aliakbar Nafar | Parisa Kordjamshidi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the explainability of the models in addition to their performance and generalizability in the low-data regime. Several approaches for such integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(https://github.com/HLR/DomiKnowS).

pdf bib abs
Towards Navigation by Reasoning over Spatial Configurations
Yue Zhang | Quan Guo | Parisa Kordjamshidi
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent’s reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.

2020

pdf bib abs
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng | Quan Guo | Parisa Kordjamshidi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This work deals with the challenge of learning and reasoning over language and vision data for the related downstream tasks such as visual question answering (VQA) and natural language for visual reasoning (NLVR). We design a novel cross-modality relevance module that is used in an end-to-end framework to learn the relevance representation between components of various input modalities under the supervision of a target task, which is more generalizable to unobserved data compared to merely reshaping the original representation space. In addition to modeling the relevance between the textual entities and visual entities, we model the higher-order relevance between entity relations in the text and object relations in the image. Our proposed approach shows competitive performance on two different language and vision tasks using public benchmarks and improves the state-of-the-art published results. The learned alignments of input spaces and their relevance representations by NLVR task boost the training efficiency of VQA task.

Co-authors

Andrzej Uszok 1

Chen Zheng 1

Venues

Fix author