Yannis Katsis


pdf bib
Development of an Enterprise-Grade Contract Understanding System
Arvind Agarwal | Laura Chiticariu | Poornima Chozhiyath Raman | Marina Danilevsky | Diman Ghazi | Ankush Gupta | Shanmukha Guttula | Yannis Katsis | Rajasekar Krishnamurthy | Yunyao Li | Shubham Mudgal | Vitobha Munigala | Nicholas Phan | Dhaval Sonawane | Sneha Srinivasan | Sudarshan R. Thitte | Mitesh Vasa | Ramiya Venkatachalam | Vinitha Yaski | Huaiyu Zhu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Contracts are arguably the most important type of business documents. Despite their significance in business, legal contract review largely remains an arduous, expensive and manual process. In this paper, we describe TECUS: a commercial system designed and deployed for contract understanding and used by a wide range of enterprise users for the past few years. We reflect on the challenges and design decisions when building TECUS. We also summarize the data science life cycle of TECUS and share lessons learned.

pdf bib
BLAR: Biomedical Local Acronym Resolver
William Hogan | Yoshiki Vazquez Baeza | Yannis Katsis | Tyler Baldwin | Ho-Cheol Kim | Chun-Nan Hsu
Proceedings of the 20th Workshop on Biomedical Language Processing

NLP has emerged as an essential tool to extract knowledge from the exponentially increasing volumes of biomedical texts. Many NLP tasks, such as named entity recognition and named entity normalization, are especially challenging in the biomedical domain partly because of the prolific use of acronyms. Long names for diseases, bacteria, and chemicals are often replaced by acronyms. We propose Biomedical Local Acronym Resolver (BLAR), a high-performing acronym resolver that leverages state-of-the-art (SOTA) pre-trained language models to accurately resolve local acronyms in biomedical texts. We test BLAR on the Ab3P corpus and achieve state-of-the-art results compared to the current best-performing local acronym resolution algorithms and models.


pdf bib
CORD-19: The COVID-19 Open Research Dataset
Lucy Lu Wang | Kyle Lo | Yoganand Chandrasekhar | Russell Reas | Jiangjiang Yang | Doug Burdick | Darrin Eide | Kathryn Funk | Yannis Katsis | Rodney Michael Kinney | Yunyao Li | Ziyang Liu | William Merrill | Paul Mooney | Dewey A. Murdick | Devvret Rishi | Jerry Sheehan | Zhihong Shen | Brandon Stilson | Alex D. Wade | Kuansan Wang | Nancy Xin Ru Wang | Christopher Wilhelm | Boya Xie | Douglas M. Raymond | Daniel S. Weld | Oren Etzioni | Sebastian Kohlmeier
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.

pdf bib
A Survey of the State of Explainable AI for Natural Language Processing
Marina Danilevsky | Kun Qian | Ranit Aharonov | Yannis Katsis | Ban Kawas | Prithviraj Sen
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.