Omar Attia
2024
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
Revanth Gangi Reddy
|
Omar Attia
|
Yunyao Li
|
Heng Ji
|
Saloni Potdar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ranking is a fundamental problem in search, however, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain QA, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking which leverages multi-vector embeddings to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity. We propose a multi-granular contrastive loss for training multi-vector approaches and validate its utility with both sentences and propositions as ranking units. Finally, we demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation, surpassing the performance of prompt-driven citation generation.
Entity Disambiguation via Fusion Entity Decoding
Junxiong Wang
|
Ali Mousavi
|
Omar Attia
|
Ronak Pradeep
|
Saloni Potdar
|
Alexander Rush
|
Umar Farooq Minhas
|
Yunyao Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked.We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity.Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
Search
Co-authors
- Yunyao Li 2
- Saloni Potdar 2
- Revanth Gangi Reddy 1
- Heng Ji 1
- Junxiong Wang 1
- show all...