Galen Andrew

2023

We train and deploy language models (LMs) with federated learning (FL) and differential privacy (DP) in Google Keyboard (Gboard). The recent DP-Follow the Regularized Leader (DP-FTRL) algorithm is applied to achieve meaningfully formal DP guarantees without requiring uniform sampling of clients. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation of training. With the help of pretraining on public data, we trained and deployed more than fifteen Gboard LMs that achieve high utility and $\rho-$zCDP privacy guarantees with $\rho \in (0.3, 2)$, with one model additionally trained with secure aggregation. We summarize our experience and provide concrete suggestions on DP training for practitioners.

2007

pdf bib

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing
Jianfeng Gao | Galen Andrew | Mark Johnson | Kristina Toutanova
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib abs

Tregex and Tsurgeon: tools for querying and manipulating tree data structures
Roger Levy | Galen Andrew
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

With syntactically annotated corpora becoming increasingly available for a variety of languages and grammatical frameworks, tree query tools have proven invaluable to linguists and computer scientists for both data exploration and corpus-based research. We provide a combined engine for tree query (Tregex) and manipulation (Tsurgeon) that can operate on arbitrary tree data structures with no need for preprocessing. Tregex remedies several expressive and implementational limitations of existing query tools, while Tsurgeon is to our knowledge the most expressive tree manipulation utility available.

pdf bib

A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation
Galen Andrew
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

Galen Andrew

2023

2007

2006

2005

2004

Co-authors

Venues