Improving Natural Language Understanding by Reverse Mapping Bytepair Encoding

Chaodong Tong, Huailiang Peng, Qiong Dai, Lei Jiang, Jianghua Huang


Abstract
We propose a method called reverse mapping bytepair encoding, which maps named-entity information and other word-level linguistic features back to subwords during the encoding procedure of bytepair encoding (BPE). We employ this method to the Generative Pre-trained Transformer (OpenAI GPT) by adding a weighted linear layer after the embedding layer. We also propose a new model architecture named as the multi-channel separate transformer to employ a training process without parameter-sharing. Evaluation on Stories Cloze, RTE, SciTail and SST-2 datasets demonstrates the effectiveness of our approach.
Anthology ID:
K19-1016
Volume:
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Mohit Bansal, Aline Villavicencio
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
163–173
Language:
URL:
https://aclanthology.org/K19-1016
DOI:
10.18653/v1/K19-1016
Bibkey:
Cite (ACL):
Chaodong Tong, Huailiang Peng, Qiong Dai, Lei Jiang, and Jianghua Huang. 2019. Improving Natural Language Understanding by Reverse Mapping Bytepair Encoding. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 163–173, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Improving Natural Language Understanding by Reverse Mapping Bytepair Encoding (Tong et al., CoNLL 2019)
Copy Citation:
PDF:
https://aclanthology.org/K19-1016.pdf