Mingqing Chen
2022
Scaling Language Model Size in Cross-Device Federated Learning
Jae Ro
|
Theresa Breiner
|
Lara McConnaughey
|
Mingqing Chen
|
Ananda Suresh
|
Shankar Kumar
|
Rajiv Mathews
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.
2019
Federated Learning of N-Gram Language Models
Mingqing Chen
|
Ananda Theertha Suresh
|
Rajiv Mathews
|
Adeline Wong
|
Cyril Allauzen
|
Françoise Beaufays
|
Michael Riley
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the users’ data ever leaving their devices. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models for latency reasons. We propose to train a recurrent neural network language model using the decentralized FederatedAveraging algorithm and to approximate this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the devices.
Search
Fix data
Co-authors
- Rajiv Mathews 2
- Cyril Allauzen 1
- Françoise Beaufays 1
- Theresa Breiner 1
- Shankar Kumar 1
- show all...