Yuxuan Lu


2024

pdf bib
StorySparkQA: Expert-Annotated QA Pairs with Real-World Knowledge for Children’s Story-Based Learning
Jiaju Chen | Yuxuan Lu | Shao Zhang | Bingsheng Yao | Yuanzhe Dong | Ying Xu | Yunyao Li | Qianwen Wang | Dakuo Wang | Yuling Sun
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Interactive story reading is common in early childhood education, where teachers expect to teach both language skills and real-world knowledge beyond the story. While many story reading systems have been developed for this activity, they often fail to infuse real-world knowledge into the conversation. This limitation can be attributed to the existing question-answering (QA) datasets used for children’s education, upon which the systems are built, failing to capture the nuances of how education experts think when conducting interactive story reading activities. To bridge this gap, we design an annotation framework, empowered by existing knowledge graph to capture experts’ annotations and thinking process, and leverage this framework to construct StorySparkQA dataset, which comprises 5, 868 expert-annotated QA pairs with real-world knowledge. We conduct automated and human expert evaluations across various QA pair generation settings to demonstrate that our StorySparkQA can effectively support models in generating QA pairs that target real-world knowledge beyond story content. StorySparkQA is available at https://huggingface.co/datasets/NEU-HAI/StorySparkQA.

pdf bib
More Samples or More Prompts? Exploring Effective Few-Shot In-Context Learning for LLMs with In-Context Sampling
Bingsheng Yao | Guiming Chen | Ruishi Zou | Yuxuan Lu | Jiachen Li | Shao Zhang | Yisi Sang | Sijia Liu | James Hendler | Dakuo Wang
Findings of the Association for Computational Linguistics: NAACL 2024

While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM’s performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs’ performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM’s performance, which sheds light on a new yet promising future research direction.

2023

pdf bib
Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture
Bingsheng Yao | Ishan Jindal | Lucian Popa | Yannis Katsis | Sayan Ghosh | Lihong He | Yuxuan Lu | Shashank Srivastava | Yunyao Li | James Hendler | Dakuo Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to support experts’ real-world need for label and explanation annotations in low-resource scenarios. Our AL architecture leverages an explanation-generation model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations. Automated and human evaluations demonstrate the effectiveness of incorporating explanations into AL sampling and the improved human annotation efficiency and trustworthiness with our AL architecture. Additional ablation studies illustrate the potential of our AL architecture for transfer learning, generalizability, and integration with large language models (LLMs). While LLMs exhibit exceptional explanation-generation capabilities for relatively simple tasks, their effectiveness in complex real-world tasks warrants further in-depth study.