The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs

Nitish Shirish Keskar; Bryan McCann; Caiming Xiong; Richard Socher

doi:10.18653/v1/2020.emnlp-main.501

The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs

Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

Abstract

Pre-training in natural language processing makes it easier for an adversary with only query access to a victim model to reconstruct a local copy of the victim by training with gibberish input data paired with the victim’s labels for that data. We discover that this extraction process extends to local copies initialized from a pre-trained, multilingual model while the victim remains monolingual. The extracted model learns the task from the monolingual victim, but it generalizes far better than the victim to several other languages. This is done without ever showing the multilingual, extracted model a well-formed input in any of the languages for the target task. We also demonstrate that a few real examples can greatly improve performance, and we analyze how these results shed light on how such extraction methods succeed.

Anthology ID:: 2020.emnlp-main.501
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6203–6207
Language:
URL:: https://aclanthology.org/2020.emnlp-main.501
DOI:: 10.18653/v1/2020.emnlp-main.501
Bibkey:
Cite (ACL):: Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6203–6207, Online. Association for Computational Linguistics.
Cite (Informal):: The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs (Keskar et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.501.pdf
Video:: https://slideslive.com/38938724

PDF Cite Search Video