Bei Yu


2024

pdf bib
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Haoyuan Wu | Haisheng Zheng | Zhuolun He | Bei Yu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal parameter increase when guaranteeing the quality of approximation in function space compared to original sparse upcycling. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GPT-3.5.Our code is available at https://github.com/wuhy68/Parameter-Efficient-MoE.

2023

pdf bib
ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs
Yang Bai | Wenqian Zhao | Shuo Yin | Zixiao Wang | Bei Yu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The training and inference efficiency of ever-larger deep neural networks highly rely on the performance of tensor operators on specific hardware platforms. Therefore, a compilation-based optimization flow with automatic tensor generation and parameter tuning is necessary for efficient model deployment. While compilation-based methods with performance models can provide dynamic and suitable code optimization, they suffer from a large design space exploration with rough measurement accuracy and poor transferability among different hardware platforms. This paper presents ATFormer, a simple yet efficient design with attention-inspired modules to accurately predict the performance of optimized operators by capturing global and long-range dependencies within a complete scheduling space. Compared with state-of-the-arts, ATFormer can predict the optimal implementation of tensor operators to reduce inference time with minimal effort on modern DNN benchmarks. Furthermore, ATFormer with pre-trained parameters can quickly adapt to different workloads and hardware via transfer learning.

pdf bib
Can ChatGPT Understand Causal Language in Science Claims?
Yuheun Kim | Lu Guo | Bei Yu | Yingya Li
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

This study evaluated ChatGPT’s ability to understand causal language in science papers and news by testing its accuracy in a task of labeling the strength of a claim as causal, conditional causal, correlational, or no relationship. The results show that ChatGPT is still behind the existing fine-tuned BERT models by a large margin. ChatGPT also had difficulty understanding conditional causal claims mitigated by hedges. However, its weakness may be utilized to improve the clarity of human annotation guideline. Chain-of-Thoughts were faithful and helpful for improving prompt performance, but finding the optimal prompt is difficult with inconsistent results and the lack of effective method to establish cause-effect between prompts and outcomes, suggesting caution when generalizing prompt engineering results across tasks or models.

2022

pdf bib
“Devils Are in the Details”: Annotating Specificity of Clinical Advice from Medical Literature
Yingya Li | Bei Yu
Proceedings of the Second Workshop on Understanding Implicit and Underspecified Language

Prior studies have raised concerns over specificity issues in clinical advice. Lacking specificity — explicitly discussed detailed information — may affect the quality and implementation of clinical advice in medical practice. In this study, we developed and validated a fine-grained annotation schema to describe different aspects of specificity in clinical advice extracted from medical research literature. We also presented our initial annotation effort and discussed future directions towards an NLP-based specificity analysis tool for summarizing and verifying the details in clinical advice.

2021

pdf bib
Self Promotion in US Congressional Tweets
Jun Wang | Kelly Cui | Bei Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prior studies have found that women self-promote less than men due to gender stereotypes. In this study we built a BERT-based NLP model to predict whether a Congressional tweet shows self-promotion or not and then used this model to examine whether a gender gap in self-promotion exists among Congressional tweets. After analyzing 2 million Congressional tweets from July 2017 to March 2021, controlling for a number of factors that include political party, chamber, age, number of terms in Congress, number of daily tweets, and number of followers, we found that women in Congress actually perform more self-promotion on Twitter, indicating a reversal of traditional gender norms where women self-promote less than men.

pdf bib
Detecting Health Advice in Medical Research Literature
Yingya Li | Jun Wang | Bei Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Health and medical researchers often give clinical and policy recommendations to inform health practice and public health policy. However, no current health information system supports the direct retrieval of health advice. This study fills the gap by developing and validating an NLP-based prediction model for identifying health advice in research publications. We annotated a corpus of 6,000 sentences extracted from structured abstracts in PubMed publications as ‘“strong advice”, “weak advice”, or “no advice”, and developed a BERT-based model that can predict, with a macro-averaged F1-score of 0.93, whether a sentence gives strong advice, weak advice, or not. The prediction model generalized well to sentences in both unstructured abstracts and discussion sections, where health advice normally appears. We also conducted a case study that applied this prediction model to retrieve specific health advice on COVID-19 treatments from LitCovid, a large COVID research literature portal, demonstrating the usefulness of retrieving health advice sentences as an advanced research literature navigation function for health researchers and the general public.

2020

pdf bib
Measuring Correlation-to-Causation Exaggeration in Press Releases
Bei Yu | Jun Wang | Lu Guo | Yingya Li
Proceedings of the 28th International Conference on Computational Linguistics

Press releases have an increasingly strong influence on media coverage of health research; however, they have been found to contain seriously exaggerated claims that can misinform the public and undermine public trust in science. In this study we propose an NLP approach to identify exaggerated causal claims made in health press releases that report on observational studies, which are designed to establish correlational findings, but are often exaggerated as causal. We developed a new corpus and trained models that can identify causal claims in the main statements in a press release. By comparing the claims made in a press release with the corresponding claims in the original research paper, we found that 22% of press releases made exaggerated causal claims from correlational findings in observational studies. Furthermore, universities exaggerated more often than journal publishers by a ratio of 1.5 to 1. Encouragingly, the exaggeration rate has slightly decreased over the past 10 years, despite the increase of the total number of press releases. More research is needed to understand the cause of the decreasing pattern.

2019

pdf bib
Detecting Causal Language Use in Science Findings
Bei Yu | Yingya Li | Jun Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Causal interpretation of correlational findings from observational studies has been a major type of misinformation in science communication. Prior studies on identifying inappropriate use of causal language relied on manual content analysis, which is not scalable for examining a large volume of science publications. In this study, we first annotated a corpus of over 3,000 PubMed research conclusion sentences, then developed a BERT-based prediction model that classifies conclusion sentences into “no relationship”, “correlational”, “conditional causal”, and “direct causal” categories, achieving an accuracy of 0.90 and a macro-F1 of 0.88. We then applied the prediction model to measure the causal language use in the research conclusions of about 38,000 observational studies in PubMed. The prediction result shows that 21.7% studies used direct causal language exclusively in their conclusions, and 32.4% used some direct causal language. We also found that the ratio of causal language use differs among authors from different countries, challenging the notion of a shared consensus on causal language use in the global science community. Our prediction model could also be used to help identify the inappropriate use of causal language in science publications.

2018

pdf bib
An Evaluation of Information Extraction Tools for Identifying Health Claims in News Headlines
Shi Yuan | Bei Yu
Proceedings of the Workshop Events and Stories in the News 2018

This study evaluates the performance of four information extraction tools (extractors) on identifying health claims in health news headlines. A health claim is defined as a triplet: IV (what is being manipulated), DV (what is being measured) and their relation. Tools that can identify health claims provide the foundation for evaluating the accuracy of these claims against authoritative resources. The evaluation result shows that 26% headlines do not in-clude health claims, and all extractors face difficulty separating them from the rest. For those with health claims, OPENIE-5.0 performed the best with F-measure at 0.6 level for ex-tracting “IV-relation-DV”. However, the characteristic linguistic structures in health news headlines, such as incomplete sentences and non-verb relations, pose particular challenge to existing tools.

2017

pdf bib
An NLP Analysis of Exaggerated Claims in Science News
Yingya Li | Jieke Zhang | Bei Yu
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

The discrepancy between science and media has been affecting the effectiveness of science communication. Original findings from science publications may be distorted with altered claim strength when reported to the public, causing misinformation spread. This study conducts an NLP analysis of exaggerated claims in science news, and then constructed prediction models for identifying claim strength levels in science reporting. The results demonstrate different writing styles journal articles and news/press releases use for reporting scientific findings. Preliminary prediction models reached promising result with room for further improvement.

2014

pdf bib
Classifying Negative Findings in Biomedical Publications
Bei Yu | Daniele Fanelli
Proceedings of BioNLP 2014

2012

pdf bib
Function Words for Chinese Authorship Attribution
Bei Yu
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature