Junying Chen


2024

pdf bib
Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Junying Chen | Chi Gui | Ruyi Ouyang | Anningzhe Gao | Shunian Chen | Guiming Hardy Chen | Xidong Wang | Zhenyang Cai | Ke Ji | Xiang Wan | Benyou Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed’s large-scale, de-identified medical image-text pairs to address these limitations, they often fall short due to inherent data noise. To tackle this, we refined medical image-text pairs from PubMed and employed MLLMs (GPT-4V) in an ‘unblinded’ capacity to denoise and reformat the data, resulting in the creation of the **PubMedVision** dataset with 1.3 million medical VQA samples. Our validation demonstrates that: (1) PubMedVision can significantly enhance the medical multimodal capabilities of MLLMs, showing significant improvement in benchmarks including the MMMU Health & Medicine track; (2) manual checks by medical experts and empirical results validate the superior data quality of our dataset compared to other data construction methods. Using PubMedVision, we train a 34B medical MLLM **HuatuoGPT-Vision**, which shows superior performance in medical multimodal scenarios among open-source MLLMs. Our code and data are available at https://github.com/FreedomIntelligence/HuatuoGPT-Vision.

pdf bib
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang | Guiming Chen | Song Dingjie | Zhang Zhiyi | Zhihong Chen | Qingying Xiao | Junying Chen | Feng Jiang | Jianquan Li | Xiang Wan | Benyou Wang | Haizhou Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in contextual incongruities to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.

pdf bib
AceGPT, Localizing Large Language Models in Arabic
Huang Huang | Fei Yu | Jianqing Zhu | Xuening Sun | Hao Cheng | Song Dingjie | Zhihong Chen | Mosen Alharthi | Bang An | Juncai He | Ziche Liu | Junying Chen | Jianquan Li | Benyou Wang | Lian Zhang | Ruoyu Sun | Xiang Wan | Haizhou Li | Jinchao Xu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic, alongside Reinforcement Learning with AI Feedback (RLAIF) employing a reward model attuned to local culture and values. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities. Comprehensive evaluations reveal that the resulting model, dubbed ‘AceGPT’, sets the state-of-the-art standard for open Arabic LLMs across various benchmarks. Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT.

2023

pdf bib
HuatuoGPT, Towards Taming Language Model to Be a Doctor
Hongbo Zhang | Junying Chen | Feng Jiang | Fei Yu | Zhihong Chen | Guiming Chen | Jianquan Li | Xiangbo Wu | Zhang Zhiyi | Qingying Xiao | Xiang Wan | Benyou Wang | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2023

In this paper, we present HuatuoGPT, a Large Language Model (LLM) for medical consultation. The core recipe of HuatuoGPT is to leverage both distilled data from **ChatGPT** and real-world data from **doctors** in the supervised fine-tuning stage. This is not only because purely using **ChatGPT**-distilled data might cause ‘model collapse’, but also because real-world data from **doctors** would be complementary to **ChatGPT**-distilled data. The responses from ChatGPT are usually detailed, well-presented, fluent, and instruction-followed, but it cannot perform like a doctor in many aspects, e.g. for interactive diagnosis. Therefore, the extra doctors’ data could tame a distilled language model to perform like doctors. To synergize the strengths of both data sources, we introduce RLMF (Reinforcement Learning from Mixed Feedback) where a reward model is trained to align the language model with the merits that both sources (ChatGPT and doctors) bring. Experimental results (in GPT-4 evaluation, human evaluation, and medical benchmark datasets) demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs. It is worth noting that by using additional real-world data and RLMF, the distilled language model (i.e., HuatuoGPT) outperforms its teacher model (i.e., ChatGPT) in most cases.

pdf bib
Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval
Nengzheng Jin | Dongfang Li | Junying Chen | Joanna Siebert | Qingcai Chen
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

2021

pdf bib
Leveraging Capsule Routing to Associate Knowledge with Medical Literature Hierarchically
Xin Liu | Qingcai Chen | Junying Chen | Wenxiu Zhou | Tingyu Liu | Xinlan Yang | Weihua Peng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Integrating knowledge into text is a promising way to enrich text representation, especially in the medical field. However, undifferentiated knowledge not only confuses the text representation but also imports unexpected noises. In this paper, to alleviate this problem, we propose leveraging capsule routing to associate knowledge with medical literature hierarchically (called HiCapsRKL). Firstly, HiCapsRKL extracts two empirically designed text fragments from medical literature and encodes them into fragment representations respectively. Secondly, the capsule routing algorithm is applied to two fragment representations. Through the capsule computing and dynamic routing, each representation is processed into a new representation (denoted as caps-representation), and we integrate the caps-representations as information gain to associate knowledge with medical literature hierarchically. Finally, HiCapsRKL are validated on relevance prediction and medical literature retrieval test sets. The experimental results and analyses show that HiCapsRKLcan more accurately associate knowledge with medical literature than mainstream methods. In summary, HiCapsRKL can efficiently help selecting the most relevant knowledge to the medical literature, which may be an alternative attempt to improve knowledge-based text representation. Source code is released on GitHub.

2020

pdf bib
MedWriter: Knowledge-Aware Medical Text Generation
Youcheng Pan | Qingcai Chen | Weihua Peng | Xiaolong Wang | Baotian Hu | Xin Liu | Junying Chen | Wenxiu Zhou
Proceedings of the 28th International Conference on Computational Linguistics

To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical. However, most of recent works only consider the information of unstructured text rather than structured information of the knowledge graph. In this paper, we focus on the medical topic-to-text generation task and adapt a knowledge-aware text generation model to the medical domain, named MedWriter, which not only introduces the specific knowledge from the external MKG but also is capable of learning graph-level representation. We conduct experiments on a medical literature dataset collected from medical journals, each of which has a set of topic words, an abstract of medical literature and a corresponding knowledge graph from CMeKG. Experimental results demonstrate incorporating knowledge graph into generation model can improve the quality of the generated text and has robust superiority over the competitor methods.

pdf bib
Aligned Dual Channel Graph Convolutional Network for Visual Question Answering
Qingbao Huang | Jielong Wei | Yi Cai | Changmeng Zheng | Junying Chen | Ho-fung Leung | Qing Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Visual question answering aims to answer the natural language question about a given image. Existing graph-based methods only focus on the relations between objects in an image and neglect the importance of the syntactic dependency relations between words in a question. To simultaneously capture the relations between objects in an image and the syntactic dependency relations between words in a question, we propose a novel dual channel graph convolutional network (DC-GCN) for better combining visual and textual advantages. The DC-GCN model consists of three parts: an I-GCN module to capture the relations between objects in an image, a Q-GCN module to capture the syntactic dependency relations between words in a question, and an attention alignment module to align image representations and question representations. Experimental results show that our model achieves comparable performance with the state-of-the-art approaches.