Referring to Screen Texts with Voice Assistants

Shruti Bhargava, Anand Dhoot, Ing-marie Jonsson, Hoang Long Nguyen, Alkesh Patel, Hong Yu, Vincent Renkens


Abstract
Voice assistants help users make phone calls, send messages, create events, navigate and do a lot more. However assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, urls, and dates on their phone screens. We focus on reference understanding, which is particularly interesting when, similar to visual grounding, there are multiple similar texts on screen. We collect a dataset and propose a lightweight general purpose model for this novel experience. Since consuming pixels directly is expensive, our system is designed to rely only on text extracted from the UI. Our model is modular, offering flexibility, better interpretability and efficient run time memory.
Anthology ID:
2023.acl-industry.72
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
752–762
Language:
URL:
https://aclanthology.org/2023.acl-industry.72
DOI:
10.18653/v1/2023.acl-industry.72
Bibkey:
Cite (ACL):
Shruti Bhargava, Anand Dhoot, Ing-marie Jonsson, Hoang Long Nguyen, Alkesh Patel, Hong Yu, and Vincent Renkens. 2023. Referring to Screen Texts with Voice Assistants. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 752–762, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Referring to Screen Texts with Voice Assistants (Bhargava et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-industry.72.pdf