Neural Retriever and Go Beyond: A Thesis Proposal

Man Luo


Abstract
Information Retriever (IR) aims to find the relevant documents (e.g. snippets, passages, and articles) to a given query at large scale. IR plays an important role in many tasks such as open domain question answering and dialogue systems, where external knowledge is needed. In the past, searching algorithms based on term matching have been widely used. Recently, neural-based algorithms (termed as neural retrievers) have gained more attention which can mitigate the limitations of traditional methods. Regardless of the success achieved by neural retrievers, they still face many challenges, e.g. suffering from a small amount of training data and failing to answer simple entity-centric questions. Furthermore, most of the existing neural retrievers are developed for pure-text query. This prevents them from handling multi-modality queries (i.e. the query is composed of textual description and images). This proposal has two goals. First, we introduce methods to address the abovementioned issues of neural retrievers from three angles, new model architectures, IR-oriented pretraining tasks, and generating large scale training data. Second, we identify the future research direction and propose potential corresponding solution.
Anthology ID:
2022.naacl-srw.8
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Editors:
Daphne Ippolito, Liunian Harold Li, Maria Leonor Pacheco, Danqi Chen, Nianwen Xue
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–67
Language:
URL:
https://aclanthology.org/2022.naacl-srw.8
DOI:
10.18653/v1/2022.naacl-srw.8
Bibkey:
Cite (ACL):
Man Luo. 2022. Neural Retriever and Go Beyond: A Thesis Proposal. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 59–67, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Neural Retriever and Go Beyond: A Thesis Proposal (Luo, NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-srw.8.pdf
Video:
 https://aclanthology.org/2022.naacl-srw.8.mp4
Data
OK-VQA