Xander Uyttendaele
2022
Design Challenges for a Multi-Perspective Search Engine
Sihao Chen
|
Siyi Liu
|
Xander Uyttendaele
|
Yi Zhang
|
William Bruno
|
Dan Roth
Findings of the Association for Computational Linguistics: NAACL 2022
Many users turn to document retrieval systems (e.g. search engines) to seek answers to controversial or open-ended questions. However, classical document retrieval systems fall short at delivering users a set of direct and diverse responses in such cases, which requires identifying responses within web documents in the context of the query, and aggregating the responses based on their different perspectives. The goal of this work is to survey and study the user information needs for building a multi-perspective search engine of such. We examine the challenges of synthesizing such language understanding objectives with document retrieval, and study a new perspective-oriented document retrieval paradigm. We discuss and assess the inherent natural language understanding challenges one needs to address in order to achieve the goal. Following the design challenges and principles, we propose and evaluate a practical prototype pipeline system. We use the prototype system to conduct a user survey in order to assess the utility of our paradigm, as well as understanding the user information needs when issuing controversial and open-ended queries to a search engine.
2021
MultiOpEd: A Corpus of Multi-Perspective News Editorials
Siyi Liu
|
Sihao Chen
|
Xander Uyttendaele
|
Dan Roth
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
We propose MultiOpEd, an open-domain news editorial corpus that supports various tasks pertaining to the argumentation structure in news editorials, focusing on automatic perspective discovery. News editorial is a genre of persuasive text, where the argumentation structure is usually implicit. However, the arguments presented in an editorial typically center around a concise, focused thesis, which we refer to as their perspective. MultiOpEd aims at supporting the study of multiple tasks relevant to automatic perspective discovery, where a system is expected to produce a single-sentence thesis statement summarizing the arguments presented. We argue that identifying and abstracting such natural language perspectives from editorials is a crucial step toward studying the implicit argumentation structure in news editorials. We first discuss the challenges and define a few conceptual tasks towards our goal. To demonstrate the utility of MultiOpEd and the induced tasks, we study the problem of perspective summarization in a multi-task learning setting, as a case study. We show that, with the induced tasks as auxiliary tasks, we can improve the quality of the perspective summary generated. We hope that MultiOpEd will be a useful resource for future studies on argumentation in the news editorial domain.