Laura Chiticariu


pdf bib
Domain-Aware Dependency Parsing for Questions
Aparna Garimella | Laura Chiticariu | Yunyao Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Development of an Enterprise-Grade Contract Understanding System
Arvind Agarwal | Laura Chiticariu | Poornima Chozhiyath Raman | Marina Danilevsky | Diman Ghazi | Ankush Gupta | Shanmukha Guttula | Yannis Katsis | Rajasekar Krishnamurthy | Yunyao Li | Shubham Mudgal | Vitobha Munigala | Nicholas Phan | Dhaval Sonawane | Sneha Srinivasan | Sudarshan R. Thitte | Mitesh Vasa | Ramiya Venkatachalam | Vinitha Yaski | Huaiyu Zhu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Contracts are arguably the most important type of business documents. Despite their significance in business, legal contract review largely remains an arduous, expensive and manual process. In this paper, we describe TECUS: a commercial system designed and deployed for contract understanding and used by a wide range of enterprise users for the past few years. We reflect on the challenges and design decisions when building TECUS. We also summarize the data science life cycle of TECUS and share lessons learned.


pdf bib
Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification
Prithviraj Sen | Marina Danilevsky | Yunyao Li | Siddhartha Brahma | Matthias Boehm | Laura Chiticariu | Rajasekar Krishnamurthy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Interpretability of predictive models is becoming increasingly important with growing adoption in the real-world. We present RuleNN, a neural network architecture for learning transparent models for sentence classification. The models are in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandable semantics. More precisely, RuleNN learns linguistic expressions (LE) built on top of predicates extracted using shallow natural language understanding. Our experimental results show that RuleNN outperforms statistical relational learning and other neuro-symbolic methods, and performs comparably with black-box recurrent neural networks. Our user studies confirm that the learned LEs are explainable and capture domain semantics. Moreover, allowing domain experts to modify LEs and instill more domain knowledge leads to human-machine co-creation of models with better performance.


pdf bib
Towards Universal Semantic Representation
Huaiyu Zhu | Yunyao Li | Laura Chiticariu
Proceedings of the First International Workshop on Designing Meaning Representations

Natural language understanding at the semantic level and independent of language variations is of great practical value. Existing approaches such as semantic role labeling (SRL) and abstract meaning representation (AMR) still have features related to the peculiarities of the particular language. In this work we describe various challenges and possible solutions in designing a semantic representation that is universal across a variety of languages.


pdf bib
SystemT: Declarative Text Understanding for Enterprise
Laura Chiticariu | Marina Danilevsky | Yunyao Li | Frederick Reiss | Huaiyu Zhu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

The rise of enterprise applications over unstructured and semi-structured documents poses new challenges to text understanding systems across multiple dimensions. We present SystemT, a declarative text understanding system that addresses these challenges and has been deployed in a wide range of enterprise applications. We highlight the design considerations and decisions behind SystemT in addressing the needs of the enterprise setting. We also summarize the impact of SystemT on business and education.


pdf bib
CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
Chenguang Wang | Alan Akbik | Laura Chiticariu | Yunyao Li | Fei Xia | Anbang Xu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Crowdsourcing has proven to be an effective method for generating labeled data for a range of NLP tasks. However, multiple recent attempts of using crowdsourcing to generate gold-labeled training data for semantic role labeling (SRL) reported only modest results, indicating that SRL is perhaps too difficult a task to be effectively crowdsourced. In this paper, we postulate that while producing SRL annotation does require expert involvement in general, a large subset of SRL labeling tasks is in fact appropriate for the crowd. We present a novel workflow in which we employ a classifier to identify difficult annotation tasks and route each task either to experts or crowd workers according to their difficulties. Our experimental evaluation shows that the proposed approach reduces the workload for experts by over two-thirds, and thus significantly reduces the cost of producing SRL annotation at little loss in quality.


pdf bib
Multilingual Information Extraction with PolyglotIE
Alan Akbik | Laura Chiticariu | Marina Danilevsky | Yonas Kbrom | Yunyao Li | Huaiyu Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We present PolyglotIE, a web-based tool for developing extractors that perform Information Extraction (IE) over multilingual data. Our tool has two core features: First, it allows users to develop extractors against a unified abstraction that is shared across a large set of natural languages. This means that an extractor needs only be created once for one language, but will then run on multilingual data without any additional effort or language-specific knowledge on part of the user. Second, it embeds this abstraction as a set of views within a declarative IE system, allowing users to quickly create extractors using a mature IE query language. We present PolyglotIE as a hands-on demo in which users can experiment with creating extractors, execute them on multilingual text and inspect extraction results. Using the UI, we discuss the challenges and potential of using unified, crosslingual semantic abstractions as basis for downstream applications. We demonstrate multilingual IE for 9 languages from 4 different language groups: English, German, French, Spanish, Japanese, Chinese, Arabic, Russian and Hindi.


pdf bib
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
Alan Akbik | Laura Chiticariu | Marina Danilevsky | Yunyao Li | Shivakumar Vaithyanathan | Huaiyu Zhu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future
Laura Chiticariu | Yunyao Li | Frederick Reiss
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The rise of Big Data analytics over unstructured text has led to renewed interest in information extraction (IE). These applications need effective IE as a first step towards solving end-to-end real world problems (e.g. biology, medicine, finance, media and entertainment, etc). Much recent NLP research has focused on addressing specific IE problems using a pipeline of multiple machine learning techniques. This approach requires an analyst with the expertise to answer questions such as: “What ML techniques should I combine to solve this problem?”; “What features will be useful for the composite pipeline?”; and “Why is my model giving the wrong answer on this document?”. The need for this expertise creates problems in real world applications. It is very difficult in practice to find an analyst who both understands the real world problem and has deep knowledge of applied machine learning. As a result, the real impact by current IE research does not match up to the abundant opportunities available.In this tutorial, we introduce the concept of transparent machine learning. A transparent ML technique is one that:- produces models that a typical real world use can read and understand;- uses algorithms that a typical real world user can understand; and- allows a real world user to adapt models to new domains.The tutorial is aimed at IE researchers in both the academic and industry communities who are interested in developing and applying transparent ML.


pdf bib
Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
Laura Chiticariu | Yunyao Li | Frederick R. Reiss
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
Towards Efficient Named-Entity Rule Induction for Customizability
Ajay Nagesh | Ganesh Ramakrishnan | Laura Chiticariu | Rajasekar Krishnamurthy | Ankush Dharkar | Pushpak Bhattacharyya
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
WizIE: A Best Practices Guided Development Environment for Information Extraction
Yunyao Li | Laura Chiticariu | Huahai Yang | Frederick Reiss | Arnaldo Carreno-fuentes
Proceedings of the ACL 2012 System Demonstrations


pdf bib
SystemT: A Declarative Information Extraction System
Yunyao Li | Frederick Reiss | Laura Chiticariu
Proceedings of the ACL-HLT 2011 System Demonstrations


pdf bib
SystemT: An Algebraic Approach to Declarative Information Extraction
Laura Chiticariu | Rajasekar Krishnamurthy | Yunyao Li | Sriram Raghavan | Frederick Reiss | Shivakumar Vaithyanathan
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks
Laura Chiticariu | Rajasekar Krishnamurthy | Yunyao Li | Frederick Reiss | Shivakumar Vaithyanathan
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing