Zhiyong Wang


2025

pdf bib
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Peng Xia | Xingtong Yu | Ming Hu | Lie Ju | Zhiyong Wang | Peibo Duan | Zongyuan Ge
Proceedings of the 31st International Conference on Computational Linguistics

Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios. Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships. These efforts are constrained by their inability to perform effectively across varied granularity of categories. To tackle this issue, we propose a novel framework (**HGCLIP**) that effectively combines **CLIP** with a deeper exploitation of the **H**ierarchical class structure via **G**raph representation learning. We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category. After passing through a graph encoder, the textual features incorporate hierarchical structure information, while the image features emphasize class-aware features derived from prototypes through the attention mechanism. Our approach demonstrates significant improvements on 11 diverse visual recognition benchmarks. Our codes are fully available at https: //github.com/richard-peng-xia/HGCLIP.

2023

pdf bib
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Peggy Tang | Junbin Gao | Lei Zhang | Zhiyong Wang
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

2022

pdf bib
OTExtSum: Extractive Text Summarisation with Optimal Transport
Peggy Tang | Kun Hu | Rui Yan | Lei Zhang | Junbin Gao | Zhiyong Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Extractive text summarisation aims to select salient sentences from a document to form a short yet informative summary. While learning-based methods have achieved promising results, they have several limitations, such as dependence on expensive training and lack of interpretability. Therefore, in this paper, we propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem, namely Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction is conceptualised as obtaining an optimal summary that minimises the transportation cost to a given document regarding their semantic distributions. Such a cost is defined by the Wasserstein distance and used to measure the summary’s semantic coverage of the original document. Comprehensive experiments on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and CNN/DM demonstrate that our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.

pdf bib
1Cademy at Semeval-2022 Task 1: Investigating the Effectiveness of Multilingual, Multitask, and Language-Agnostic Tricks for the Reverse Dictionary Task
Zhiyong Wang | Ge Zhang | Nineli Lashkarashvili
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system for the Se- mEval2022 task of matching dictionary glosses to word embeddings. We focus on the Reverse Dictionary Track of the competition, which maps multilingual glosses to reconstructed vector representations. More specifically, models convert the input of sentences to three types of embeddings: SGNS, Char, and Electra. We pro- pose several experiments for applying neural network cells, general multilingual and multi-task structures, and language-agnostic tricks to the task. We also provide comparisons over different types of word embeddings and ablation studies to suggest helpful strategies. Our initial transformer-based model achieves relatively low performance. However, trials on different retokenization methodologies indicate improved performance. Our proposed Elmo- based monolingual model achieves the highest outcome, and its multitask, and multilingual varieties show competitive results as well.