Bang An


2024

pdf bib
AceGPT, Localizing Large Language Models in Arabic
Huang Huang | Fei Yu | Jianqing Zhu | Xuening Sun | Hao Cheng | Song Dingjie | Zhihong Chen | Mosen Alharthi | Bang An | Juncai He | Ziche Liu | Junying Chen | Jianquan Li | Benyou Wang | Lian Zhang | Ruoyu Sun | Xiang Wan | Haizhou Li | Jinchao Xu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic, alongside Reinforcement Learning with AI Feedback (RLAIF) employing a reward model attuned to local culture and values. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities. Comprehensive evaluations reveal that the resulting model, dubbed ‘AceGPT’, sets the state-of-the-art standard for open Arabic LLMs across various benchmarks. Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT.

2020

pdf bib
Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference
Bang An | Jie Lyu | Zhenyi Wang | Chunyuan Li | Changwei Hu | Fei Tan | Ruiyi Zhang | Yifan Hu | Changyou Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The neural attention mechanism plays an important role in many natural language processing applications. In particular, multi-head attention extends single-head attention by allowing a model to jointly attend information from different perspectives. However, without explicit constraining, multi-head attention may suffer from attention collapse, an issue that makes different heads extract similar attentive features, thus limiting the model’s representation power. In this paper, for the first time, we provide a novel understanding of multi-head attention from a Bayesian perspective. Based on the recently developed particle-optimization sampling techniques, we propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention and consequently strengthens model’s expressiveness. Remarkably, our Bayesian interpretation provides theoretical inspirations on the not-well-understood questions: why and how one uses multi-head attention. Extensive experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity, leading to more informative representations with consistent performance improvement on multiple tasks.

pdf bib
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints
Zhenyi Wang | Xiaoyang Wang | Bang An | Dong Yu | Changyou Chen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Text generation from a knowledge base aims to translate knowledge triples to natural language descriptions. Most existing methods ignore the faithfulness between a generated text description and the original table, leading to generated information that goes beyond the content of the table. In this paper, for the first time, we propose a novel Transformer-based generation framework to achieve the goal. The core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss and a table-text embedding similarity loss based on the Transformer model. Furthermore, to evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem. We also provide detailed analysis on each component of our model in our experiments. Automatic and human evaluations show that our framework can significantly outperform state-of-the-art by a large margin.