Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Jonathan H. Clark | Dan Garrette | Iulia Turc | John Wieting
Transactions of the Association for Computational Linguistics, Volume 10
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model’s ability to adapt. In this paper, we present Canine, a neural encoder that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. Canine outperforms a comparable mBert model by 5.7 F1 on TyDi QA, a challenging multilingual benchmark, despite having fewer model parameters.
Learning Task Sampling Policy for Multitask Learning
Dhanasekar Sundararaman | Henry Tsai | Kuang-Huei Lee | Iulia Turc | Lawrence Carin
Findings of the Association for Computational Linguistics: EMNLP 2021
It has been shown that training multi-task models with auxiliary tasks can improve the target task quality through cross-task transfer. However, the importance of each auxiliary task to the primary task is likely not known a priori. While the importance weights of auxiliary tasks can be manually tuned, it becomes practically infeasible with the number of tasks scaling up. To address this, we propose a search method that automatically assigns importance weights. We formulate it as a reinforcement learning problem and learn a task sampling schedule based on the evaluation accuracy of the multi-task model. Our empirical evaluation on XNLI and GLUE shows that our method outperforms uniform sampling and the corresponding single-task baseline.
High Performance Natural Language Processing
Gabriel Ilharco | Cesar Ilharco | Iulia Turc | Tim Dettmers | Felipe Ferreira | Kenton Lee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Scale has played a central role in the rapid progress natural language processing has enjoyed in recent years. While benchmarks are dominated by ever larger models, efficient hardware use is critical for their widespread adoption and further progress in the field. In this cutting-edge tutorial, we will recapitulate the state-of-the-art in natural language processing with scale in perspective. After establishing these foundations, we will cover a wide range of techniques for improving efficiency, including knowledge distillation, quantization, pruning, more efficient architectures, along with case studies and practical implementation tricks.
- Dhanasekar Sundararaman 1
- Henry Tsai 1
- Kuang-Huei Lee 1
- Lawrence Carin 1
- Jonathan H. Clark 1
- show all...