Liang Zhang


pdf bib
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
Anwen Hu | Shizhe Chen | Liang Zhang | Qin Jin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatic image captioning evaluation is critical for benchmarking and promoting advances in image captioning research. Existing metrics only provide a single score to measure caption qualities, which are less explainable and informative. Instead, we humans can easily identify the problems of captions in details, e.g., which words are inaccurate and which salient objects are not described, and then rate the caption quality. To support such informative feedback, we propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC). Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level, and also provide a text precision score, a vision recall score and an overall quality score at coarse-grained level. The coarse-grained score of InfoMetIC achieves significantly better correlation with human judgements than existing metrics on multiple benchmarks. We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation. Our code and datasets are publicly available at

pdf bib
Movie101: A New Movie Understanding Benchmark
Zihao Yue | Qi Zhang | Anwen Hu | Liang Zhang | Ziheng Wang | Qin Jin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at


pdf bib
Towards Better Document-level Relation Extraction via Iterative Inference
Liang Zhang | Jinsong Su | Yidong Chen | Zhongjian Miao | Min Zijun | Qingguo Hu | Xiaodong Shi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.


pdf bib
Hashtag Recommendation with Topical Attention-Based LSTM
Yang Li | Ting Liu | Jing Jiang | Liang Zhang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Microblogging services allow users to create hashtags to categorize their posts. In recent years, the task of recommending hashtags for microblogs has been given increasing attention. However, most of existing methods depend on hand-crafted features. Motivated by the successful use of long short-term memory (LSTM) for many natural language processing tasks, in this paper, we adopt LSTM to learn the representation of a microblog post. Observing that hashtags indicate the primary topics of microblog posts, we propose a novel attention-based LSTM model which incorporates topic modeling into the LSTM architecture through an attention mechanism. We evaluate our model using a large real-world dataset. Experimental results show that our model significantly outperforms various competitive baseline methods. Furthermore, the incorporation of topical attention mechanism gives more than 7.4% improvement in F1 score compared with standard LSTM method.