Hanzhuo Tan


pdf bib
When Cantonese NLP Meets Pre-training: Progress and Challenges
Rong Xiang | Hanzhuo Tan | Jing Li | Mingyu Wan | Kam-Fai Wong
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Tutorial Abstracts

Cantonese is an influential Chinese variant with a large population of speakers worldwide. However, it is under-resourced in terms of the data scale and diversity, excluding Cantonese Natural Language Processing (NLP) from the stateof-the-art (SOTA) “pre-training and fine-tuning” paradigm. This tutorial will start with a substantially review of the linguistics and NLP progress for shaping language specificity, resources, and methodologies. It will be followed by an introduction to the trendy transformerbased pre-training methods, which have been largely advancing the SOTA performance of a wide range of downstream NLP tasks in numerous majority languages (e.g., English and Chinese). Based on the above, we will present the main challenges for Cantonese NLP in relation to Cantonese language idiosyncrasies of colloquialism and multilingualism, followed by the future directions to line NLP for Cantonese and other low-resource languages up to the cutting-edge pre-training practice.

pdf bib
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu | Hanzhuo Tan | Jing Li | Piji Li
Findings of the Association for Computational Linguistics: EMNLP 2022

The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels – entity-level insertion, projection and concretization and scene-level restatement and extension — are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing over 16K multimedia tweets with manually annotated discourse labels. The experimental results show that trendy multimedia encoders based on multi-head attention (with captions) are unable to well understand cross-modality discourse and additionally modeling texts at the output layer helps yield the-state-of-the-art results.