Yuan Zhuang


2023

pdf bib
PLAtE: A Large-scale Dataset for List Page Web Extraction
Aidan San | Yuan Zhuang | Jan Bakus | Colin Lockard | David Ciemiewicz | Sandeep Atluri | Kevin Small | Yangfeng Ji | Heba Elfardy
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites. However, a barrier for continued progress is the small number of datasets large enough to train these models. In this work, we introduce the PLAtE (Pages of Lists Attribute Extraction) benchmark dataset as a challenging new web extraction task. PLAtE focuses on shopping data, specifically extractions from product review pages with multiple items encompassing the tasks of: (1) finding product list segmentation boundaries and (2) extracting attributes for each product. PLAtE is composed of 52,898 items collected from 6,694 pages and 156,014 attributes, making it the first large-scale list page web extraction dataset. We use a multi-stage approach to collect and annotate the dataset and adapt three state-of-the-art web extraction models to the two tasks comparing their strengths and weaknesses both quantitatively and qualitatively.

pdf bib
Eliciting Affective Events from Language Models by Multiple View Co-prompting
Yuan Zhuang | Ellen Riloff
Findings of the Association for Computational Linguistics: ACL 2023

Prior research on affective event classification showed that exploiting weakly labeled data for training can improve model performance. In this work, we propose a simpler and more effective approach for generating training data by automatically acquiring and labeling affective events with Multiple View Co-prompting, which leverages two language model prompts that provide independent views of an event. The approach starts with a modest amount of gold data and prompts pre-trained language models to generate new events. Next, information about the probable affective polarity of each event is collected from two complementary language model prompts and jointly used to assign polarity labels. Experimental results on two datasets show that the newly acquired events improve a state-of-the-art affective event classifier. We also present analyses which show that using multiple views produces polarity labels of higher quality than either view on its own.

2022

pdf bib
Exploiting Unary Relations with Stacked Learning for Relation Extraction
Yuan Zhuang | Ellen Riloff | Kiri L. Wagstaff | Raymond Francis | Matthew P. Golombek | Leslie K. Tamppari
Proceedings of the Third Workshop on Scholarly Document Processing

Relation extraction models typically cast the problem of determining whether there is a relation between a pair of entities as a single decision. However, these models can struggle with long or complex language constructions in which two entities are not directly linked, as is often the case in scientific publications. We propose a novel approach that decomposes a binary relation into two unary relations that capture each argument’s role in the relation separately. We create a stacked learning model that incorporates information from unary and binary relation extractors to determine whether a relation holds between two entities. We present experimental results showing that this approach outperforms several competitive relation extractors on a new corpus of planetary science publications as well as a benchmark dataset in the biology domain.

2020

pdf bib
Affective Event Classification with Discourse-enhanced Self-training
Yuan Zhuang | Tianyu Jiang | Ellen Riloff
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Prior research has recognized the need to associate affective polarities with events and has produced several techniques and lexical resources for identifying affective events. Our research introduces new classification models to assign affective polarity to event phrases. First, we present a BERT-based model for affective event classification and show that the classifier achieves substantially better performance than a large affective event knowledge base. Second, we present a discourse-enhanced self-training method that iteratively improves the classifier with unlabeled data. The key idea is to exploit event phrases that occur with a coreferent sentiment expression. The discourse-enhanced self-training algorithm iteratively labels new event phrases based on both the classifier’s predictions and the polarities of the event’s coreferent sentiment expressions. Our results show that discourse-enhanced self-training further improves both recall and precision for affective event classification.

pdf bib
Exploring the Role of Context to Distinguish Rhetorical and Information-Seeking Questions
Yuan Zhuang | Ellen Riloff
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Social media posts often contain questions, but many of the questions are rhetorical and do not seek information. Our work studies the problem of distinguishing rhetorical and information-seeking questions on Twitter. Most work has focused on features of the question itself, but we hypothesize that the prior context plays a role too. This paper introduces a new dataset containing questions in tweets paired with their prior tweets to provide context. We create classification models to assess the difficulty of distinguishing rhetorical and information-seeking questions, and experiment with different properties of the prior context. Our results show that the prior tweet and topic features can improve performance on this task.