Mitesh Vasa


2021

pdf bib
Development of an Enterprise-Grade Contract Understanding System
Arvind Agarwal | Laura Chiticariu | Poornima Chozhiyath Raman | Marina Danilevsky | Diman Ghazi | Ankush Gupta | Shanmukha Guttula | Yannis Katsis | Rajasekar Krishnamurthy | Yunyao Li | Shubham Mudgal | Vitobha Munigala | Nicholas Phan | Dhaval Sonawane | Sneha Srinivasan | Sudarshan R. Thitte | Mitesh Vasa | Ramiya Venkatachalam | Vinitha Yaski | Huaiyu Zhu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Contracts are arguably the most important type of business documents. Despite their significance in business, legal contract review largely remains an arduous, expensive and manual process. In this paper, we describe TECUS: a commercial system designed and deployed for contract understanding and used by a wide range of enterprise users for the past few years. We reflect on the challenges and design decisions when building TECUS. We also summarize the data science life cycle of TECUS and share lessons learned.

2018

pdf bib
Exploiting Structure in Representation of Named Entities using Active Learning
Nikita Bhutani | Kun Qian | Yunyao Li | H. V. Jagadish | Mauricio Hernandez | Mitesh Vasa
Proceedings of the 27th International Conference on Computational Linguistics

Fundamental to several knowledge-centric applications is the need to identify named entities from their textual mentions. However, entities lack a unique representation and their mentions can differ greatly. These variations arise in complex ways that cannot be captured using textual similarity metrics. However, entities have underlying structures, typically shared by entities of the same entity type, that can help reason over their name variations. Discovering, learning and manipulating these structures typically requires high manual effort in the form of large amounts of labeled training data and handwritten transformation programs. In this work, we propose an active-learning based framework that drastically reduces the labeled data required to learn the structures of entities. We show that programs for mapping entity mentions to their structures can be automatically generated using human-comprehensible labels. Our experiments show that our framework consistently outperforms both handwritten programs and supervised learning models. We also demonstrate the utility of our framework in relation extraction and entity resolution tasks.