2024
pdf
bib
abs
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Zhiyang Xu
|
Chao Feng
|
Rulin Shao
|
Trevor Ashby
|
Ying Shen
|
Di Jin
|
Yu Cheng
|
Qifan Wang
|
Lifu Huang
Findings of the Association for Computational Linguistics: ACL 2024
Despite vision-language models’ (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and catastrophic forgetting. To address these challenges, we construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date, comprising 187 diverse tasks and 1,664,261 instances sourced from academic datasets, and each task is accompanied by an expert-written instruction. In addition, we propose a two-stage instruction tuning framework, in which VLMs are firstly finetuned on Vision-Flan and further tuned on GPT-4 synthesized data. We find this two-stage tuning framework significantly outperforms the traditional single-stage visual instruction tuning framework and achieves the state-of-the-art performance across a wide range of multi-modal evaluation benchmarks. Finally, we conduct in-depth analyses to understand visual instruction tuning and our findings reveal that: (1) GPT-4 synthesized data does not substantially enhance VLMs’ capabilities but rather modulates the model’s responses to human-preferred formats; (2) A minimal quantity (e.g., 1,000) of GPT-4 synthesized data can effectively align VLM responses with human-preference; (3) Visual instruction tuning mainly helps large-language models (LLMs) to understand visual features.
2023
pdf
bib
abs
YNU-HPCC at SemEval-2023 Task7: Multi-evidence Natural Language Inference for Clinical Trial Data Based a BioBERT Model
Chao Feng
|
Jin Wang
|
Xuejie Zhang
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes the system for the YNU-HPCC team in subtask 1 of the SemEval-2023 Task 7: Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT). This task requires judging the textual entailment relationship between the given CTR and the statement annotated by the expert annotator. This system is based on the fine-tuned Bi-directional Encoder Representation from Transformers for Biomedical Text Mining (BioBERT) model with supervised contrastive learning and back translation. Supervised contrastive learning is to enhance the classification, and back translation is to enhance the training data. Our system achieved relatively good results on the competition’s official leaderboard. The code of this paper is available at
https://github.com/facanhe/SemEval-2023-Task7.
2021
pdf
bib
Exploiting Network Structures to Improve Semantic Representation for the Financial Domain
Chao Feng
|
Shijie Wei
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing
2020
pdf
bib
abs
基于层次注意力机制和门机制的属性级别情感分析(Aspect-level Sentiment Analysis Based on Hierarchical Attention and Gate Networks)
Chao Feng (冯超)
|
Haihui Li (黎海辉)
|
Hongya Zhao (赵洪雅)
|
Yun Xue (薛云)
|
Jingyao Tang (唐靖尧)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
近年来,作为细粒度的属性级别情感分析在商业界和学术界受到越来越多的关注,其目的在于识别一个句子中多个属性词所对应的情感极性。目前,在解决属性级别情感分析问题的绝大多数工作都集中在注意力机制的设计上,以此突出上下文和属性词中不同词对于属性级别情感分析的贡献,同时使上下文和属性词之间相互关联。本文提出使用层次注意力机制和门机制处理属性级别情感分析任务,在得到属性词的隐藏状态之后,通过注意力机制得到属性词新的表示,然后利用属性词新的表示和注意力机制进一步得到上下文新的表示,层次注意力机制的设计使得上下文和属性词的表达更加准确;同时通过门机制选择对属性词而言上下文中有用的信息,以此丰富上下文的表达,在SemEval 2014 Task4和Twitter数据集上的实验结果表明本文提出模型的有效性。
2019
pdf
bib
abs
Reinforced Product Metadata Selection for Helpfulness Assessment of Customer Reviews
Miao Fan
|
Chao Feng
|
Mingming Sun
|
Ping Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
To automatically assess the helpfulness of a customer review online, conventional approaches generally acquire various linguistic and neural embedding features solely from the textual content of the review itself as the evidence. We, however, find out that a helpful review is largely concerned with the metadata (such as the name, the brand, the category, etc.) of its target product. It leaves us with a challenge of how to choose the correct key-value product metadata to help appraise the helpfulness of free-text reviews more precisely. To address this problem, we propose a novel framework composed of two mutual-benefit modules. Given a product, a selector (agent) learns from both the keys in the product metadata and one of its reviews to take an action that selects the correct value, and a successive predictor (network) makes the free-text review attend to this value to obtain better neural representations for helpfulness assessment. The predictor is directly optimized by SGD with the loss of helpfulness prediction, and the selector could be updated via policy gradient rewarded with the performance of the predictor. We use two real-world datasets from Amazon.com and Yelp.com, respectively, to compare the performance of our framework with other mainstream methods under two application scenarios: helpfulness identification and regression of customer reviews. Extensive results demonstrate that our framework can achieve state-of-the-art performance with substantial improvements.