Wenzhao Qiu


2024

pdf bib
Shoes-ACOSI: A Dataset for Aspect-Based Sentiment Analysis with Implicit Opinion Extraction
Joseph J Peper | Wenzhao Qiu | Ryan Bruggeman | Yi Han | Estefania Ciliotta Chehade | Lu Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

We explore *implicit opinion extraction* as a new component of aspect-based sentiment analysis (ABSA) systems. Prior work in ABSA has investigated opinion extraction as an important subtask, however, these works only label concise, *explicitly*-stated opinion spans. In this work, we present **Shoes-ACOSI**, a new and challenging ABSA dataset in the e-commerce domain with implicit opinion span annotations, the first of its kind. Shoes-ACOSI builds upon the existing Aspect-Category-Opinion-Sentiment (ACOS) quadruple extraction task, extending the task to quintuple extraction—now localizing and differentiating both implicit and explicit opinion. In addition to the new annotation schema, our dataset contains paragraph-length inputs which, importantly, present complex challenges through increased input length, increased number of sentiment expressions, and more mixed-sentiment-polarity examples when compared with existing benchmarks. We quantify the difficulty of our new dataset by evaluating with state-of-the-art fully-supervised and prompted-LLM baselines. We find our dataset presents significant challenges for both supervised models and LLMs, particularly from the new implicit opinion extraction component of the ACOSI task, highlighting the need for continued research into implicit opinion understanding.

pdf bib
PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization
Joseph Peper | Wenzhao Qiu | Lu Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We investigate pre-training techniques for abstractive multi-document summarization (MDS), which is much less studied than summarizing single documents. Though recent work has demonstrated the effectiveness of highlighting information salience for pre-training strategy design, they struggle to generate abstractive and reflective summaries, which are critical properties for MDS. To this end, we present **PELMS**, a pre-trained model that uses pre-training objectives based on semantic coherence heuristics and faithfulness constraints together with unlabeled multi-document inputs, to promote the generation of concise, fluent, and faithful summaries. To support the training of PELMS, we compile **MultiPT**, a multi-document pre-training corpus containing over 93 million documents to form more than 3million unlabeled topic-centric document clusters, covering diverse genres such as product reviews, news, and general knowledge. We perform extensive evaluation of PELMS in low-shot settings on a wide range of MDS datasets. Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness, and with minimal fine-tuning can match performance of language models at a much larger scale (e.g., GPT-4).