Bo Chen (陈波) - ACL Anthology

Bo Chen

Also published as: 波陈

2025

Vision Language Models (VLMs) have achieved remarkable success in a wide range of vision applications of increasing complexity and scales, yet choosing the right VLM model size involves a trade-off between response quality and cost. While smaller VLMs are cheaper to run, they typically produce responses only marginally better than random guessing on benchmarks such as MMMU. In this paper, we propose Cache of Thought (CoT), a master–apprentice framework for collaborative inference between large and small VLMs. CoT manages high-quality query results from large VLMs (master) in a cache, which are then selected via a novel multi-modal retrieval and in-context learning to aid the performance of small VLMs (apprentice). We extensively evaluate CoT on various widely-recognized and challenging general reasoning benchmarks, and show that CoT increases overall reasoning performance by up to 7.7% under the same budget, and specifically boosts the reasoning performance of apprentice VLMs by up to 36.6%. Our code is available at https://github.com/UIUC-MONET/Cache-of-Thoughts.

pdf bib abs

Characterizing the expressive power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand, position embedding has emerged as a crucial technique in modern large language models, offering superior performance in capturing positional information, which shows great performance for the long context scenario. In this work, we take a circuit complexity perspective and rigorously analyze Transformers augmented with widely adopted positional embeddings. We prove that, under standard complexity assumptions, such models remain incapable of efficiently solving canonical tasks such as arithmetic formula evaluation and Boolean formula value computation. Our results expose a fundamental expressivity limitation that persists despite the remarkable empirical success of positionally-enhanced Transformers. Beyond tightening known complexity bounds, our findings offer new theoretical insights for designing future architectures with provably stronger reasoning and compositional capabilities.

2024

pdf bib abs

融合多元特征表示的藏文命名实体识别方法赵小兵∗2(Research on Tibetan Named Entity Recognition Using Multi-Feature Fusion Representation)
Cairang Ejian (俄见才让) | Maoke Zhou (周毛克) | Bo Chen (陈波) | Xiaobing Zhao (赵小兵)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“本文针对基于音节嵌入方式的藏文命名实体识别(TNER)中词汇信息和音节部件信息忽略的问题,提出了基于交叉Transformer架构的MECT-TL模型,融合了藏文音节信息、词汇信息和音节部件信息的多元数据特征。MECT-TL通过平面网络结构将藏文音节与词汇信息结合,并整合音节部件信息,有效提升了藏文实体识别的准确性。实验结果显示,相较于主流的TNER基准模型BiLSTM-CRF,本文模型在F1值上提高了5.14个百分点,与基于Transformer架构的TENER模型相比提高了4.18个百分点。这表明,融合藏文词汇和音节部件信息的方法可以显著提高TNER任务的性能。”

pdf bib abs

基于生成式语言模型的立场检测探究(Research on Stance Detection with Generative Language Model)
Yuanshuo Zhang (张袁硕) | Aohua Li (李澳华) | Zhaoning Yin (尹召宁) | Panyi Wang (王潘怡) | Bo Chen (陈波) | Xiaobing Zhao (赵小兵)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“近年来,立场检测任务受到越来越多的关注,但相关标注数据在范围和规模上都有限,不能有效支撑基于神经网络的立场检测。为此,本文探索在零样本阯少样本场景下生成式语言模型在立场检测任务上的能力。首先,构建了一个全新的面向立场检测的数据集,包含5个主题,共2500个人工标注样例;然后,在此数据集上进行了一系列探索实验,实验结果表明:生成式语言模型在零样本设定下,采用结构化的提示学习表现良好;增加额外信息能够显著提升模型性能;在少样本设定下,提供相同目标的示例能够明显提升模型性能,而不同目标示例产生了负面作用;使用思维链可以显著提升模型性能;受提示学习的启发,微调预训练语言模型进一步论证提供额外信息对立场检测的增益显著。”

2022

pdf bib abs

Semantic-aware Contrastive Learning for More Accurate Semantic Parsing
Shan Wu | Chunlei Xin | Bo Chen | Xianpei Han | Le Sun
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Since the meaning representations are detailed and accurate annotations which express fine-grained sequence-level semtantics, it is usually hard to train discriminative semantic parsers via Maximum Likelihood Estimation (MLE) in an autoregressive fashion. In this paper, we propose a semantic-aware contrastive learning algorithm, which can learn to distinguish fine-grained meaning representations and take the overall sequence-level semantic into consideration. Specifically, a multi-level online sampling algorithm is proposed to sample confusing and diverse instances. Three semantic-aware similarity functions are designed to accurately measure the distance between meaning representations as a whole. And a ranked contrastive loss is proposed to pull the representations of the semantic-identical instances together and push negative instances away. Experiments on two standard datasets show that our approach achieves significant improvements over MLE baselines and gets state-of-the-art performances by simply applying semantic-aware contrastive learning on a vanilla Seq2Seq model.

2021

pdf bib abs

EnsLM: Ensemble Language Model for Data Diversity by Semantic Clustering
Zhibin Duan | Hao Zhang | Chaojie Wang | Zhengjue Wang | Bo Chen | Mingyuan Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Natural language processing (NLP) often faces the problem of data diversity such as different domains, themes, styles, and so on. Therefore, a single language model (LM) is insufficient to learn all knowledge from diverse samples. To solve this problem, we firstly propose an autoencoding topic model with a mixture prior (mATM) to perform clustering for the data, where the clusters defined in semantic space describes the data diversity. Having obtained the clustering assignment for each sample, we develop the ensemble LM (EnsLM) with the technique of weight modulation. Specifically, EnsLM contains a backbone that is adjusted by a few modulated weights to fit for different sample clusters. As a result, the backbone learns the shared knowledge among all clusters while modulated weights extract the cluster-specific features. EnsLM can be trained jointly with mATM with a flexible LM backbone. We evaluate the effectiveness of both mATM and EnsLM on various tasks.

pdf bib abs

From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding
Shan Wu | Bo Chen | Chunlei Xin | Xianpei Han | Le Sun | Weipeng Zhang | Jiansong Chen | Fan Yang | Xunliang Cai
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Semantic parsing is challenging due to the structure gap and the semantic gap between utterances and logical forms. In this paper, we propose an unsupervised semantic parsing method - Synchronous Semantic Decoding (SSD), which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar-constrained decoding. Specifically, we reformulate semantic parsing as a constrained paraphrasing problem: given an utterance, our model synchronously generates its canonical utterancel and meaning representation. During synchronously decoding: the utterance paraphrasing is constrained by the structure of the logical form, therefore the canonical utterance can be paraphrased controlledly; the semantic decoding is guided by the semantics of the canonical utterance, therefore its logical form can be generated unsupervisedly. Experimental results show that SSD is a promising approach and can achieve state-of-the-art unsupervised semantic parsing performance on multiple datasets.

2020

pdf bib abs

Abstractive document summarization is a comprehensive task including document understanding and summary generation, in which area Transformer-based models have achieved the state-of-the-art performance. Compared with Transformers, topic models are better at learning explicit document semantics, and hence could be integrated into Transformers to further boost their performance. To this end, we rearrange and explore the semantics learned by a topic model, and then propose a topic assistant (TA) including three modules. TA is compatible with various Transformer-based models and user-friendly since i) TA is a plug-and-play model that does not break any structure of the original Transformer network, making users easily fine-tune Transformer+TA based on a well pre-trained model; ii) TA only introduces a small number of extra parameters. Experimental results on three datasets demonstrate that TA is able to improve the performance of several Transformer-based models.

2019

pdf bib abs

Improving Distantly-supervised Entity Typing with Compact Latent Space Clustering
Bo Chen | Xiaotao Gu | Yufeng Hu | Siliang Tang | Guoping Hu | Yueting Zhuang | Xiang Ren
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recently, distant supervision has gained great success on Fine-grained Entity Typing (FET). Despite its efficiency in reducing manual labeling efforts, it also brings the challenge of dealing with false entity type labels, as distant supervision assigns labels in a context-agnostic manner. Existing works alleviated this issue with partial-label loss, but usually suffer from confirmation bias, which means the classifier fit a pseudo data distribution given by itself. In this work, we propose to regularize distantly supervised models with Compact Latent Space Clustering (CLSC) to bypass this problem and effectively utilize noisy data yet. Our proposed method first dynamically constructs a similarity graph of different entity mentions; infer the labels of noisy instances via label propagation. Based on the inferred labels, mention embeddings are updated accordingly to encourage entity mentions with close semantics to form a compact cluster in the embedding space, thus leading to better classification performance. Extensive experiments on standard benchmarks show that our CLSC model consistently outperforms state-of-the-art distantly supervised entity typing systems by a significant margin.

pdf bib abs

In this paper, we propose an efficient Knowledge Constraint Fine-grained Entity Typing Annotation Tool, which further improves the entity typing process through entity linking together with some practical functions.

2018

pdf bib abs

Semi-Supervised Lexicon Learning for Wide-Coverage Semantic Parsing
Bo Chen | Bo An | Le Sun | Xianpei Han
Proceedings of the 27th International Conference on Computational Linguistics

Semantic parsers critically rely on accurate and high-coverage lexicons. However, traditional semantic parsers usually utilize annotated logical forms to learn the lexicon, which often suffer from the lexicon coverage problem. In this paper, we propose a graph-based semi-supervised learning framework that makes use of large text corpora and lexical resources. This framework first constructs a graph with a phrase similarity model learned by utilizing many text corpora and lexical resources. Next, graph propagation algorithm identifies the label distribution of unlabeled phrases from labeled ones. We evaluate our approach on two benchmarks: Webquestions and Free917. The results show that, in both datasets, our method achieves substantial improvement when comparing to the base system that does not utilize the learned lexicon, and gains competitive results when comparing to state-of-the-art systems.

pdf bib abs

Accurate Text-Enhanced Knowledge Graph Representation Learning
Bo An | Bo Chen | Xianpei Han | Le Sun
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Previous representation learning techniques for knowledge graph representation usually represent the same entity or relation in different triples with the same representation, without considering the ambiguity of relations and entities. To appropriately handle the semantic variety of entities/relations in distinct triples, we propose an accurate text-enhanced knowledge graph representation learning method, which can represent a relation/entity with different representations in different triples by exploiting additional textual information. Specifically, our method enhances representations by exploiting the entity descriptions and triple-specific relation mention. And a mutual attention mechanism between relation mention and entity description is proposed to learn more accurate textual representations for further improving knowledge graph representation. Experimental results show that our method achieves the state-of-the-art performance on both link prediction and triple classification tasks, and significantly outperforms previous text-enhanced knowledge representation models.

pdf bib abs

Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing
Bo Chen | Le Sun | Xianpei Han
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper proposes a neural semantic parsing approach – Sequence-to-Action, which models semantic parsing as an end-to-end semantic graph generation process. Our method simultaneously leverages the advantages from two recent promising directions of semantic parsing. Firstly, our model uses a semantic graph to represent the meaning of a sentence, which has a tight-coupling with knowledge bases. Secondly, by leveraging the powerful representation learning and prediction ability of neural network models, we propose a RNN model which can effectively map sentences to action sequences for semantic graph generation. Experiments show that our method achieves state-of-the-art performance on Overnight dataset and gets competitive performance on Geo and Atis datasets.

2017

pdf bib abs

East Asian languages are thought to handle reference differently from languages such as English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expressions Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.

Bo Chen

2025

2024

2022

2021

2020

2019

2018

2017

2016

2008

2006

Co-authors

Venues