Jing Cao


2024

pdf bib
MMToM-QA: Multimodal Theory of Mind Question Answering
Chuanyang Jin | Yutong Wu | Jing Cao | Jiannan Xiang | Yen-Ling Kuo | Zhiting Hu | Tomer Ullman | Antonio Torralba | Joshua Tenenbaum | Tianmin Shu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Theory of Mind (ToM), the ability to understand people’s mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets – either video or text. Human ToM, on the other hand, is more than video or text understanding. People can flexibly reason about another person’s mind based on conceptual representations (e.g., goals, beliefs, plans) extracted from any available data. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. MMToM-QA comprehensively evaluates machine ToM both on multimodal data and on different kinds of unimodal data about a person’s activity in a household environment. To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models). BIP-ALM extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. We conducted a systematic comparison of human performance, BIP-ALM, and state-of-the-art models, including GPT-4. The experiments demonstrate that large language models and large multimodal models still lack robust ToM capacity. BIP-ALM, on the other hand, shows promising results, by leveraging the power of both model-based mental inference and language models.

2014

pdf bib
A Quantitative View of Short Utterances in Daily Conversation: A Case Study of Thats right, Thats true and Thats correct
Yanjiao Li | Alex Chengyu Fang | Jing Cao
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2013

pdf bib
Issues in the addition of ISO standard annotations to the Switchboard corpus
Harry Bunt | Alex C. Fang | Xiaoyue Liu | Jing Cao | Volha Petukhova
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation

2012

pdf bib
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
Alex C. Fang | Harry Bunt | Jing Cao | Xiaoyue Liu
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

2010

pdf bib
Enhanced Genre Classification through Linguistically Fine-Grained POS Tags
Alex Chengyu Fang | Jing Cao
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2009

pdf bib
Adjective Density as a Text Formality Characteristic for Automatic Text Classification: A Study Based on the British National Corpus
Alex Chengyu Fang | Jing Cao
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1