Jie Cao


2022

pdf bib
An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding
Jie Cao | Jing Xiao
Proceedings of the 29th International Conference on Computational Linguistics

Automatic math problem solving has attracted much attention of NLP researchers recently. However, most of the works focus on the solving of Math Word Problems (MWPs). In this paper, we study on the Geometric Problem Solving based on neural networks. Solving geometric problems requires the integration of text and diagram information as well as the knowledge of the relevant theorems. The lack of high-quality datasets and efficient neural geometric solvers impedes the development of automatic geometric problems solving. Based on GeoQA, we newly annotate 2,518 geometric problems with richer types and greater difficulty to form an augmented benchmark dataset GeoQA+, containing 6,027 problems in training set and 7,528 totally. We further perform data augmentation method to expand the training set to 12,054. Besides, we design a Dual Parallel text Encoder DPE to efficiently encode long and medium-length problem text. The experimental results validate the effectiveness of GeoQA+ and DPE module, and the accuracy of automatic geometric problem solving is improved to 66.09%.

2021

pdf bib
A Comparative Study on Schema-Guided Dialogue State Tracking
Jie Cao | Yi Zhang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Frame-based state representation is widely used in modern task-oriented dialog systems to model user intentions and slot values. However, a fixed design of domain ontology makes it difficult to extend to new services and APIs. Recent work proposed to use natural language descriptions to define the domain ontology instead of tag names for each intent or slot, thus offering a dynamic set of schema. In this paper, we conduct in-depth comparative studies to understand the use of natural language description for schema in dialog state tracking. Our discussion mainly covers three aspects: encoder architectures, impact of supplementary training, and effective schema description styles. We introduce a set of newly designed bench-marking descriptions and reveal the model robustness on both homogeneous and heterogeneous description styles in training and evaluation.

2019

pdf bib
Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation
Zhiqiang Liu | Zuohui Fu | Jie Cao | Gerard de Melo | Yik-Cheung Tam | Cheng Niu | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Rhetoric is a vital element in modern poetry, and plays an essential role in improving its aesthetics. However, to date, it has not been considered in research on automatic poetry generation. In this paper, we propose a rhetorically controlled encoder-decoder for modern Chinese poetry generation. Our model relies on a continuous latent variable as a rhetoric controller to capture various rhetorical patterns in an encoder, and then incorporates rhetoric-based mixtures while generating modern Chinese poetry. For metaphor and personification, an automated evaluation shows that our model outperforms state-of-the-art baselines by a substantial margin, while human evaluation shows that our model generates better poems than baseline methods in terms of fluency, coherence, meaningfulness, and rhetorical aesthetics.

pdf bib
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Jie Cao | Michael Tanana | Zac Imel | Eric Poitras | David Atkins | Vivek Srikumar
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatically analyzing dialogue can help understand and guide behavior in domains such as counseling, where interactions are largely mediated by conversation. In this paper, we study modeling behavioral codes used to asses a psychotherapy treatment style called Motivational Interviewing (MI), which is effective for addressing substance abuse and related problems. Specifically, we address the problem of providing real-time guidance to therapists with a dialogue observer that (1) categorizes therapist and client MI behavioral codes and, (2) forecasts codes for upcoming utterances to help guide the conversation and potentially alert the therapist. For both tasks, we define neural network models that build upon recent successes in dialogue modeling. Our experiments demonstrate that our models can outperform several baselines for both tasks. We also report the results of a careful analysis that reveals the impact of the various network design tradeoffs for modeling therapy dialogue.

pdf bib
Amazon at MRP 2019: Parsing Meaning Representations with Lexical and Phrasal Anchoring
Jie Cao | Yi Zhang | Adel Youssef | Vivek Srikumar
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes the system submission of our team Amazon to the shared task on Cross Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). Via extensive analysis of implicit alignments in AMR, we recategorize five meaning representations (MRs) into two classes: Lexical- Anchoring and Phrasal-Anchoring. Then we propose a unified graph-based parsing framework for the lexical-anchoring MRs, and a phrase-structure parsing for one of the phrasal- anchoring MRs, UCCA. Our system submission ranked 1st in the AMR subtask, and later improvements show promising results on other frameworks as well.

2009

pdf bib
Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions
Zhixiang Ren | Yajuan Lü | Jie Cao | Qun Liu | Yun Huang
Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE 2009)