An On-device Deep-Learning Approach for Attribute Extraction from Heterogeneous Unstructured Text

Mahesh Gorijala, Aniruddha Bala, Pinaki Bhaskar, Krishnaditya, Vikram Mupparthi


Abstract
Mobile devices, with their rapidly growing usage, have turned into rich sources of user information, holding critical insights for betterment of user experience and personalization. Creating, receiving and storing important information in the form of unstructured text has become a part and parcel of daily routine of users. From purchase deliveries in Short Message Service (SMS) or Notifications, to event booking details in Calendar applications, mobile devices serve as a portal for understanding user interests, behaviours and activities through information extraction. In this paper, we address the challenge of on-device extraction of user information from unstructured data in natural language from heterogeneous sources like messages, notification, calendar etc. The issue of privacy concern is effectively eliminated by the on-device nature of the proposed solution. Our proposed solution consists of 3 components – A Na ̈ıve-Bayes based classifier for domain identification, a Dual Character andWord based Bidirectional Long Short Term Memory (Bi-LSTM) and Conditional Random Field (CRF) model for attribute extraction and a rule-based Entity Linker. Our solution achieved a 93.29% F1 score on five domains (shopping, travel, event, service and personal). Since on-device deployment has memory and latency constraints, we ensure minimal model size and optimal inference latency. To demonstrate the efficacy of our approach, we have experimented on CoNLL- 2003 dataset and achieved comparable performance to existing benchmark results.
Anthology ID:
2021.icon-main.70
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
573–582
Language:
URL:
https://aclanthology.org/2021.icon-main.70
DOI:
Bibkey:
Cite (ACL):
Mahesh Gorijala, Aniruddha Bala, Pinaki Bhaskar, Krishnaditya, and Vikram Mupparthi. 2021. An On-device Deep-Learning Approach for Attribute Extraction from Heterogeneous Unstructured Text. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 573–582, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
An On-device Deep-Learning Approach for Attribute Extraction from Heterogeneous Unstructured Text (Gorijala et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.70.pdf
Optional supplementary material:
 2021.icon-main.70.OptionalSupplementaryMaterial.pdf
Data
CoNLL++