Wei Jin


2024

pdf bib
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu | Hejie Cui | Yue Yu | Xuan Kan | Wenqi Shi | Yuchen Zhuang | May Dongmei Wang | Wei Jin | Joyce Ho | Carl Yang
Findings of the Association for Computational Linguistics ACL 2024

Clinical natural language processing faces challenges like complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation with LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 8 clinical NLP tasks and 18 datasets reveals that ClinGen consistently enhances performance across various tasks by 7.7%-8.7% on average, effectively aligning the distribution of real datasets and enriching the diversity of generated training instances.

2021

pdf bib
The Authors Matter: Understanding and Mitigating Implicit Bias in Deep Text Classification
Haochen Liu | Wei Jin | Hamid Karimi | Zitao Liu | Jiliang Tang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2019

pdf bib
Incorporating Emoji Descriptions Improves Tweet Classification
Abhishek Singh | Eduardo Blanco | Wei Jin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Tweets are short messages that often include specialized language such as hashtags and emojis. In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. We show that this strategy is more effective than using pretrained emoji embeddings for tweet classification. Specifically, we obtain new state-of-the-art results in irony detection and sentiment analysis despite our neural network is simpler than previous proposals.

2010

pdf bib
HCAMiner: Mining Concept Associations for Knowledge Discovery through Concept Chain Queries
Wei Jin | Xin Wu
Coling 2010: Demonstrations