2024
pdf
bib
abs
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering
Haoyu Wang
|
Ruirui Li
|
Haoming Jiang
|
Jinjin Tian
|
Zhengyang Wang
|
Chen Luo
|
Xianfeng Tang
|
Monica Xiao Cheng
|
Tuo Zhao
|
Jing Gao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios. However, these methods often struggle with complex inputs and encounter difficulties due to noisy knowledge retrieval, notably hindering model effectiveness. To address this issue, we introduce BlendFilter, a novel approach that elevates retrieval-augmented LLMs by integrating query generation blending with knowledge filtering. BlendFilter proposes the blending process through its query generation method, which integrates both external and internal knowledge augmentation with the original query, ensuring comprehensive information gathering. Additionally, our distinctive knowledge filtering module capitalizes on the intrinsic capabilities of the LLM, effectively eliminating extraneous data. We conduct extensive experiments on three open-domain question answering benchmarks, and the findings clearly indicate that our innovative BlendFilter surpasses state-of-the-art baselines significantly.
pdf
bib
abs
MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
Baixuan Xu
|
Weiqi Wang
|
Haochen Shi
|
Wenxuan Ding
|
Huihao Jing
|
Tianqing Fang
|
Jiaxin Bai
|
Xin Liu
|
Changlong Yu
|
Zheng Li
|
Chen Luo
|
Qingyu Yin
|
Bing Yin
|
Long Chen
|
Yangqiu Song
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. Using Amazon Review data, we apply MIND and create a multimodal intention knowledge base, which contains 1,264,441 intentions derived from 126,142 co-buy shopping records across 107,215 products. Extensive human evaluations demonstrate the high plausibility and typicality of our obtained intentions and validate the effectiveness of our distillation framework and filtering mechanism. Further experiments reveal the positive downstream benefits that MIND brings to intention comprehension tasks and highlight the importance of multimodal generation and role-aware filtering. Additionally, MIND shows robustness to different prompts and superior generation quality compared to previous methods.
pdf
bib
abs
Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark
Fenglin Liu
|
Zheng Li
|
Hongjian Zhou
|
Qingyu Yin
|
Jingfeng Yang
|
Xianfeng Tang
|
Chen Luo
|
Ming Zeng
|
Haoming Jiang
|
Yifan Gao
|
Priyanka Nigam
|
Sreyashi Nag
|
Bing Yin
|
Yining Hua
|
Xuan Zhou
|
Omid Rohanian
|
Anshul Thakur
|
Lei Clifton
|
David A. Clifton
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs
pdf
bib
abs
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
Wenxuan Ding
|
Weiqi Wang
|
Sze Heng Douglas Kwok
|
Minghao Liu
|
Tianqing Fang
|
Jiaxin Bai
|
Xin Liu
|
Changlong Yu
|
Zheng Li
|
Chen Luo
|
Qingyu Yin
|
Bing Yin
|
Junxian He
|
Yangqiu Song
Findings of the Association for Computational Linguistics: EMNLP 2024
Enhancing Language Models’ (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs’ comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances.
pdf
bib
abs
IterAlign: Iterative Constitutional Alignment of Large Language Models
Xiusi Chen
|
Hongzhi Wen
|
Sreyashi Nag
|
Chen Luo
|
Qingyu Yin
|
Ruirui Li
|
Zheng Li
|
Wei Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are labor-intensive and resource-consuming. To overcome these drawbacks, we study constitution-based LLM alignment and propose a data-driven constitution discovery and self-alignment framework called IterAlign. IterAlign leverages red teaming to unveil the weaknesses of an LLM and automatically discovers new constitutions using a stronger LLM. These constitutions are then used to guide self-correction of the base LLM. Such a constitution discovery pipeline can be run iteratively and automatically to discover new constitutions that specifically target the alignment gaps in the current LLM. Empirical results on several safety benchmark datasets and multiple base LLMs show that IterAlign successfully improves truthfulness, helpfulness, harmlessness and honesty, improving the LLM alignment by up to 13.5% in harmlessness.
2022
pdf
bib
abs
CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Rui Feng
|
Chen Luo
|
Qingyu Yin
|
Bing Yin
|
Tuo Zhao
|
Chao Zhang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
User sessions empower many search and recommendation tasks on a daily basis. Such session data are semi-structured, which encode heterogeneous relations between queries and products, and each item is described by the unstructured text. Despite recent advances in self-supervised learning for text or graphs, there lack of self-supervised learning models that can effectively capture both intra-item semantics and inter-item interactions for semi-structured sessions. To fill this gap, we propose CERES, a graph-based transformer model for semi-structured session data. CERES learns representations that capture both inter- and intra-item semantics with (1) a graph-conditioned masked language pretraining task that jointly learns from item text and item-item relations; and (2) a graph-conditioned transformer architecture that propagates inter-item contexts to item-level representations. We pretrained CERES using ~468 million Amazon sessions and find that CERES outperforms strong pretraining baselines by up to 9% in three session search and entity linking tasks.