Xinying Song


pdf bib
An Efficient Conversational Smart Compose System
Yun Zhu | Xiayu Chen | Lei Shu | Bowen Tan | Xinying Song | Lijuan Liu | Maria Wang | Jindong Chen | Ning Ruan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Online conversation is a ubiquitous way to share information and connect everyone but repetitive idiomatic text typing takes users a lot of time. This paper demonstrates a simple yet effective cloud based smart compose system to improve human-to-human conversation efficiency. Heuristics from different perspectives are designed to achieve the best trade-off between quality and latency. From the modeling side, the decoder-only model exploited the previous turns of conversational history in a computation lightweight manner. Besides, a novel phrase tokenizer is proposed to reduce latency without losing the composing quality further. Additionally, the caching mechanism is applied to the serving framework. The demo video of the system is available at open-sourced our phrase tokenizer in


pdf bib
Token Dropping for Efficient BERT Pretraining
Le Hou | Richard Yuanzhe Pang | Tianyi Zhou | Yuexin Wu | Xinying Song | Xiaodan Song | Denny Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective “token dropping” method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In particular, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens more efficiently if with limited computational resource. The dropped tokens are later picked up by the last layer of the model so that the model still produces full-length sequences. We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead. In our experiments, this simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.


pdf bib
Fast WordPiece Tokenization
Xinying Song | Alex Salcianu | Yang Song | Dave Dopson | Denny Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Tokenization is a fundamental preprocessing step for almost all NLP tasks. In this paper, we propose efficient algorithms for the WordPiece tokenization used in BERT, from single-word tokenization to general text (e.g., sentence) tokenization. When tokenizing a single word, WordPiece uses a longest-match-first strategy, known as maximum matching. The best known algorithms so far are O(nˆ2) (where n is the input length) or O(nm) (where m is the maximum vocabulary token length). We propose a novel algorithm whose tokenization complexity is strictly O(n). Our method is inspired by the Aho-Corasick algorithm. We introduce additional linkages on top of the trie built from the vocabulary, allowing smart transitions when the trie matching cannot continue. For general text, we further propose an algorithm that combines pre-tokenization (splitting the text into words) and our linear-time WordPiece method into a single pass. Experimental results show that our method is 8.2x faster than HuggingFace Tokenizers and 5.1x faster than TensorFlow Text on average for general text tokenization.


pdf bib
Better Binarization for the CKY Parsing
Xinying Song | Shilin Ding | Chin-Yew Lin
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing