2025
pdf
bib
abs
PANDA - Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset
Michael Bennie
|
Demi Zhang
|
Bushi Xiao
|
Jing Cao
|
Chryseis Xinyi Liu
|
Jian Meng
|
Alayo Tripp
Proceedings of the First Workshop on Multilingual Counterspeech Generation
Despite the global prevalence of Modern Standard Chinese language, counterspeech (CS) resources for Chinese remain virtually nonexistent. To address this gap in East Asian counterspeech research we introduce the a corpus of Modern Standard Mandarin counterspeech that focuses on combating hate speech in Mainland China. This paper proposes a novel approach of generating CS by using an LLM-as-a-Judge, simulated annealing, LLMs zero-shot CN generation and a round-robin algorithm. This is followed by manual verification for quality and contextual relevance. This paper details the methodology for creating effective counterspeech in Chinese and other non-Eurocentric languages, including unique cultural patterns of which groups are maligned and linguistic patterns in what kinds of discourse markers are programmatically marked as hate speech (HS). Analysis of the generated corpora, we provide strong evidence for the lack of open-source, properly labeled Chinese hate speech data and the limitations of using an LLM-as-Judge to score possible answers in Chinese. Moreover, the present corpus servers as the first East Asian language based CS corpus and provides an essential resource for future research on counterspeech generation and evaluation.
2024
pdf
bib
abs
MMToM-QA: Multimodal Theory of Mind Question Answering
Chuanyang Jin
|
Yutong Wu
|
Jing Cao
|
Jiannan Xiang
|
Yen-Ling Kuo
|
Zhiting Hu
|
Tomer Ullman
|
Antonio Torralba
|
Joshua Tenenbaum
|
Tianmin Shu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Theory of Mind (ToM), the ability to understand people’s mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets – either video or text. Human ToM, on the other hand, is more than video or text understanding. People can flexibly reason about another person’s mind based on conceptual representations (e.g., goals, beliefs, plans) extracted from any available data. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. MMToM-QA comprehensively evaluates machine ToM both on multimodal data and on different kinds of unimodal data about a person’s activity in a household environment. To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models). BIP-ALM extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. We conducted a systematic comparison of human performance, BIP-ALM, and state-of-the-art models, including GPT-4. The experiments demonstrate that large language models and large multimodal models still lack robust ToM capacity. BIP-ALM, on the other hand, shows promising results, by leveraging the power of both model-based mental inference and language models.
2014
pdf
bib
A Quantitative View of Short Utterances in Daily Conversation: A Case Study of Thats right, Thats true and Thats correct
Yanjiao Li
|
Alex Chengyu Fang
|
Jing Cao
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing
2013
pdf
bib
Issues in the addition of ISO standard annotations to the Switchboard corpus
Harry Bunt
|
Alex C. Fang
|
Xiaoyue Liu
|
Jing Cao
|
Volha Petukhova
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation
2012
pdf
bib
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
Alex C. Fang
|
Harry Bunt
|
Jing Cao
|
Xiaoyue Liu
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
2010
pdf
bib
Enhanced Genre Classification through Linguistically Fine-Grained POS Tags
Alex Chengyu Fang
|
Jing Cao
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
2009
pdf
bib
Adjective Density as a Text Formality Characteristic for Automatic Text Classification: A Study Based on the British National Corpus
Alex Chengyu Fang
|
Jing Cao
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1