Han Zhao


pdf bib
DD-TIG at SemEval-2022 Task 5: Investigating the Relationships Between Multimodal and Unimodal Information in Misogynous Memes Detection and Classification
Ziming Zhou | Han Zhao | Jingjing Dong | Ning Ding | Xiaolong Liu | Kangli Zhang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our submission for task 5 Multimedia Automatic Misogyny Identification (MAMI) at SemEval-2022. The task is designed to detect and classify misogynous memes. To utilize both textual and visual information presented in a meme, we investigate several of the most recent visual language transformer-based multimodal models and choose ERNIE-ViL-Large as our base model. For subtask A, with observations of models’ overfitting on unimodal patterns, strategies are proposed to mitigate problems of biased words and template memes. For subtask B, we transform this multi-label problem into a multi-class one and experiment with oversampling and complementary techniques. Our approach places 2nd for subtask A and 5th for subtask B in this competition.

pdf bib
Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning
Zhenhailong Wang | Hang Yu | Manling Li | Han Zhao | Heng Ji
Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models

Despite achieving state-of-the-art zero-shot performance, existing vision-language models still fall short of few-shot transfer ability on domain-specific problems. Classical fine-tuning often fails to prevent highly expressive models from exploiting spurious correlations. Although model-agnostic meta-learning (MAML) presents as a natural alternative for few-shot transfer learning, the expensive computation due to implicit second-order optimization limits its use on large-scale vision-language models such as CLIP. While much literature has been devoted to exploring alternative optimization strategies, we identify another essential aspect towards effective few-shot transfer learning, task sampling, which is previously only be viewed as part of data pre-processing in MAML. To show the impact of task sampling, we propose a simple algorithm, Model-Agnostic Multitask Fine-tuning (MAMF), which differentiates classical fine-tuning only on uniformly sampling multiple tasks. Despite its simplicity, we show that MAMF consistently outperforms classical fine-tuning on five few-shot image classification tasks. We further show that the effectiveness of the bi-level optimization in MAML is highly sensitive to the zero-shot performance of a task in the context of few-shot vision-language classification. The goal of this paper is to provide new insights on what makes few-shot learning work, and encourage more research into investigating better task sampling strategies.

pdf bib
Conditional Supervised Contrastive Learning for Fair Text Classification
Jianfeng Chi | William Shand | Yaodong Yu | Kai-Wei Chang | Han Zhao | Yuan Tian
Findings of the Association for Computational Linguistics: EMNLP 2022

Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data. However, the learned representations could potentially lead to performance disparities in downstream tasks, such as increased silencing of underrepresented groups in toxicity comment classification. In light of this challenge, in this work, we study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives, and then propose to use conditional supervised contrastive objectives to learn fair representations for text classification. We conduct experiments on two text datasets to demonstrate the effectiveness of our approaches in balancing the trade-offs between task performance and bias mitigation among existing baselines for text classification. Furthermore, we also show that the proposed methods are stable in different hyperparameter settings.

pdf bib
DD-TIG at Constraint@ACL2022: Multimodal Understanding and Reasoning for Role Labeling of Entities in Hateful Memes
Ziming Zhou | Han Zhao | Jingjing Dong | Jun Gao | Xiaolong Liu
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations

The memes serve as an important tool in online communication, whereas some hateful memes endanger cyberspace by attacking certain people or subjects. Recent studies address hateful memes detection while further understanding of relationships of entities in memes remains unexplored. This paper presents our work at the Constraint@ACL2022 Shared Task: Hero, Villain and Victim: Dissecting harmful memes for semantic role labelling of entities. In particular, we propose our approach utilizing transformer-based multimodal models through a VCR method with data augmentation, continual pretraining, loss re-weighting, and ensemble learning. We describe the models used, the ways of preprocessing and experiments implementation. As a result, our best model achieves the Macro F1-score of 54.707 on the test set of this shared task.


pdf bib
EventKE: Event-Enhanced Knowledge Graph Embedding
Zixuan Zhang | Hongwei Wang | Han Zhao | Hanghang Tong | Heng Ji
Findings of the Association for Computational Linguistics: EMNLP 2021

Relations in most of the traditional knowledge graphs (KGs) only reflect static and factual connections, but fail to represent the dynamic activities and state changes about entities. In this paper, we emphasize the importance of incorporating events in KG representation learning, and propose an event-enhanced KG embedding model EventKE. Specifically, given the original KG, we first incorporate event nodes by building a heterogeneous network, where entity nodes and event nodes are distributed on the two sides of the network inter-connected by event argument links. We then use entity-entity relations from the original KG and event-event temporal links to inner-connect entity and event nodes respectively. We design a novel and effective attention-based message passing method, which is conducted on entity-entity, event-entity, and event-event relations to fuse the event information into KG embeddings. Experimental results on real-world datasets demonstrate that events can greatly improve the quality of the KG embeddings on multiple downstream tasks.