Bowen Chen


2022

pdf bib
CogBERT: Cognition-Guided Pre-trained Language Models
Xiao Ding | Bowen Chen | Li Du | Bing Qin | Ting Liu
Proceedings of the 29th International Conference on Computational Linguistics

We study the problem of integrating cognitive language processing signals (e.g., eye-tracking or EEG data) into pre-trained language models like BERT. Existing methods typically fine-tune pre-trained models on cognitive data, ignoring the semantic gap between the texts and cognitive signals. To fill the gap, we propose CogBERT, a framework that can induce fine-grained cognitive features from cognitive data and incorporate cognitive features into BERT by adaptively adjusting the weight of cognitive features for different NLP tasks. Extensive experiments show that: (1) Cognition-guided pre-trained models can consistently perform better than basic pre-trained models on ten NLP tasks. (2) Different cognitive features contribute differently to different NLP tasks. Based on this observation, we give a fine-grained explanation of why cognitive data is helpful for NLP. (3) Different transformer layers of pre-trained models should encode different cognitive features, with word-level cognitive features at the bottom and semantic-level cognitive features at the top. (4) Attention visualization demonstrates that CogBERT aligns with human gaze patterns and improves its natural language comprehension ability.

pdf bib
Syntactic and Semantic Uniformity for Semantic Parsing and Task-Oriented Dialogue Systems
Bowen Chen | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2022

This paper proposes a data representation framework for semantic parsing and task-oriented dialogue systems, aiming to achieve a uniform representation for syntactically and semantically diverse machine-readable formats. Current NLP systems heavily rely on adapting pre-trained language models to specific tasks, and this approach has been proven effective for modeling natural language texts. However, little attention has been paid to the representation of machine-readable formats, such as database queries and dialogue states. We present a method for converting original machine-readable formats of semantic parsing and task-oriented dialogue datasets into a syntactically and semantically uniform representation. We define a meta grammar for syntactically uniform representations and translate semantically equivalent functions into a uniform vocabulary. Empirical experiments on 13 datasets show that accuracy consistently improves over original formats, revealing the advantage of the proposed representation. Additionally, we show that the proposed representation allows for transfer learning across datasets.