Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

Zhucheng Tu, Mengping Li, Jimmy Lin


Abstract
We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon’s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.
Anthology ID:
N18-5002
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Yang Liu, Tim Paek, Manasi Patwardhan
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6–10
Language:
URL:
https://aclanthology.org/N18-5002
DOI:
10.18653/v1/N18-5002
Bibkey:
Cite (ACL):
Zhucheng Tu, Mengping Li, and Jimmy Lin. 2018. Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 6–10, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures (Tu et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-5002.pdf