Lei Zhang


pdf bib
Visual-Linguistic Dependency Encoding for Image-Text Retrieval
Wenxin Guo | Lei Zhang | Kun Zhang | Yi Liu | Zhendong Mao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Image-text retrieval is a fundamental task to bridge the semantic gap between natural language and vision. Recent works primarily focus on aligning textual meanings with visual appearance. However, they often overlook the semantic discrepancy caused by syntactic structure in natural language expressions and relationships among visual entities. This oversight would lead to sub-optimal alignment and degraded retrieval performance, since the underlying semantic dependencies and object interactions remain inadequately encoded in both textual and visual embeddings. In this paper, we propose a novel Visual-Linguistic Dependency Encoding (VL-DE) framework, which explicitly models the dependency information among textual words and interaction patterns between image regions, improving the discriminative power of cross-modal representations for more accurate image-text retrieval. Specifically, VL-DE enhances textual representations by considering syntactic relationships and dependency types, and visual representations by attending to its spatially neighboring regions. Cross-attention mechanism is then introduced to aggregate aligned region-word pairs into image-text similarities. Analysis on Winoground, a dataset specially designed to measure vision-linguistic compositional structure reasoning, shows that VL-DE outperforms existing methods, demonstrating its effectiveness at this task. Comprehensive experiments on two benchmarks, Flickr30K and MS-COCO, further validates the competitiveness of our approach.

pdf bib
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
Lei Zhang | Yuge Zhang | Kan Ren | Dongsheng Li | Yuqing Yang
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

The field of machine learning (ML) has gained widespread adoption, leading to significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time-consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework MLCopilot, which leverages the state-of-the-art large language models to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks. And we find that, after some dedicated design, the LLM can (i) observe from the existing experiences of ML tasks and (ii) reason effectively to deliver promising results for new tasks. The solution generated can be used directly to achieve high levels of competitiveness.

pdf bib
TW-NLP at SemEval-2024 Task10: Emotion Recognition and Emotion Reversal Inference in Multi-Party Dialogues.
Wei Tian | Peiyu Ji | Lei Zhang | Yue Jian
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In multidimensional dialogues, emotions serve not only as crucial mediators of emotional exchanges but also carry rich information. Therefore, accurately identifying the emotions of interlocutors and understanding the triggering factors of emotional changes are paramount. This study focuses on the tasks of multilingual dialogue emotion recognition and emotion reversal reasoning based on provocateurs, aiming to enhance the accuracy and depth of emotional understanding in dialogues. To achieve this goal, we propose a novel model, MBERT-TextRCNN-PL, designed to effectively capture emotional information of interlocutors. Additionally, we introduce XGBoost-EC (Emotion Capturer) to identify emotion provocateurs, thereby delving deeper into the causal relationships behind emotional changes. By comparing with state-of-the-art models, our approach demonstrates significant improvements in recognizing dialogue emotions and provocateurs, offering new insights and methodologies for multilingual dialogue emotion understanding and emotion reversal research.


pdf bib
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation
Fengyi Fu | Lei Zhang | Quan Wang | Zhendong Mao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Achieving empathy is a crucial step toward humanized dialogue systems. Current approaches for empathetic dialogue generation mainly perceive an emotional label to generate an empathetic response conditioned on it, which simply treat emotions independently, but ignore the intrinsic emotion correlation in dialogues, resulting in inaccurate emotion perception and unsuitable response generation. In this paper, we propose a novel emotion correlation enhanced empathetic dialogue generation framework, which comprehensively realizes emotion correlation learning, utilization, and supervising. Specifically, a multi-resolution emotion graph is devised to capture context-based emotion interactions from different resolutions, further modeling emotion correlation. Then we propose an emotion correlation enhanced decoder, with a novel correlation-aware aggregation and soft/hard strategy, respectively improving the emotion perception and response generation. Experimental results on the benchmark dataset demonstrate the superiority of our model in both empathetic perception and expression.

pdf bib
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Peggy Tang | Junbin Gao | Lei Zhang | Zhiyong Wang
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)


pdf bib
OTExtSum: Extractive Text Summarisation with Optimal Transport
Peggy Tang | Kun Hu | Rui Yan | Lei Zhang | Junbin Gao | Zhiyong Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Extractive text summarisation aims to select salient sentences from a document to form a short yet informative summary. While learning-based methods have achieved promising results, they have several limitations, such as dependence on expensive training and lack of interpretability. Therefore, in this paper, we propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem, namely Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction is conceptualised as obtaining an optimal summary that minimises the transportation cost to a given document regarding their semantic distributions. Such a cost is defined by the Wasserstein distance and used to measure the summary’s semantic coverage of the original document. Comprehensive experiments on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and CNN/DM demonstrate that our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.

pdf bib
QaDialMoE: Question-answering Dialogue based Fact Verification with Mixture of Experts
Longzheng Wang | Peng Zhang | Xiaoyu Lu | Lei Zhang | Chaoyang Yan | Chuang Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Fact verification is an essential tool to mitigate the spread of false information online, which has gained a widespread attention recently. However, a fact verification in the question-answering dialogue is still underexplored. In this paper, we propose a neural network based approach called question-answering dialogue based fact verification with mixture of experts (QaDialMoE). It exploits questions and evidence effectively in the verification process and can significantly improve the performance of fact verification. Specifically, we exploit the mixture of experts to focus on various interactions among responses, questions and evidence. A manager with an attention guidance module is implemented to guide the training of experts and assign a reasonable attention score to each expert. A prompt module is developed to generate synthetic questions that make our approach more generalizable. Finally, we evaluate the QaDialMoE and conduct a comparative study on three benchmark datasets. The experimental results demonstrate that our QaDialMoE outperforms previous approaches by a large margin and achieves new state-of-the-art results on all benchmarks. This includes the accuracy improvements on the HEALTHVER as 84.26%, the FAVIQ A dev set as 78.7%, the FAVIQ R dev set as 86.1%, test set as 86.0%, and the COLLOQUIAL as 89.5%. To our best knowledge, this is the first work to investigate a question-answering dialogue based fact verification, and achieves new state-of-the-art results on various benchmark datasets.


pdf bib
Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents
Rui Meng | Khushboo Thaker | Lei Zhang | Yue Dong | Xingdi Yuan | Tong Wang | Daqing He
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Faceted summarization provides briefings of a document from different perspectives. Readers can quickly comprehend the main points of a long document with the help of a structured outline. However, little research has been conducted on this subject, partially due to the lack of large-scale faceted summarization datasets. In this study, we present FacetSum, a faceted summarization benchmark built on Emerald journal articles, covering a diverse range of domains. Different from traditional document-summary pairs, FacetSum provides multiple summaries, each targeted at specific sections of a long document, including the purpose, method, findings, and value. Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries. We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems that can leverage the structured information in both long texts and summaries.

pdf bib
MedAI at SemEval-2021 Task 10: Negation-aware Pre-training for Source-free Negation Detection Domain Adaptation
Jinquan Sun | Qi Zhang | Yu Wang | Lei Zhang
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Due to the increasing concerns for data privacy, source-free unsupervised domain adaptation attracts more and more research attention, where only a trained source model is assumed to be available, while the labeled source data remain private. To get promising adaptation results, we need to find effective ways to transfer knowledge learned in source domain and leverage useful domain specific information from target domain at the same time. This paper describes our winning contribution to SemEval 2021 Task 10: Source-Free Domain Adaptation for Semantic Processing. Our key idea is to leverage the model trained on source domain data to generate pseudo labels for target domain samples. Besides, we propose Negation-aware Pre-training (NAP) to incorporate negation knowledge into model. Our method win the 1st place with F1-score of 0.822 on the official blind test set of Negation Detection Track.


pdf bib
Integrating Task Specific Information into Pretrained Language Models for Low Resource Fine Tuning
Rui Wang | Shijing Si | Guoyin Wang | Lei Zhang | Lawrence Carin | Ricardo Henao
Findings of the Association for Computational Linguistics: EMNLP 2020

Pretrained Language Models (PLMs) have improved the performance of natural language understanding in recent years. Such models are pretrained on large corpora, which encode the general prior knowledge of natural languages but are agnostic to information characteristic of downstream tasks. This often results in overfitting when fine-tuned with low resource datasets where task-specific information is limited. In this paper, we integrate label information as a task-specific prior into the self-attention component of pretrained BERT models. Experiments on several benchmarks and real-word datasets suggest that the proposed approach can largely improve the performance of pretrained models when fine-tuning with small datasets.

pdf bib
Joint Intent Detection and Entity Linking on Spatial Domain Queries
Lei Zhang | Runze Wang | Jingbo Zhou | Jingsong Yu | Zhenhua Ling | Hui Xiong
Findings of the Association for Computational Linguistics: EMNLP 2020

Continuous efforts have been devoted to language understanding (LU) for conversational queries with the fast and wide-spread popularity of voice assistants. In this paper, we first study the LU problem in the spatial domain, which is a critical problem for providing location-based services by voice assistants but is without in-depth investigation in existing studies. Spatial domain queries have several unique properties making them be more challenging for language understanding than common conversational queries, including lexical-similar but diverse intents and highly ambiguous words. Thus, a special tailored LU framework for spatial domain queries is necessary. To the end, a dataset was extracted and annotated based on the real-life queries from a voice assistant service. We then proposed a new multi-task framework that jointly learns the intent detection and entity linking tasks on the with invented hierarchical intent detection method and triple-scoring mechanism for entity linking. A specially designed spatial GCN is also utilized to model spatial context information among entities. We have conducted extensive experimental evaluations with state-of-the-art entity linking and intent detection methods, which demonstrated that can outperform all baselines with a significant margin.

pdf bib
WantWords: An Open-source Online Reverse Dictionary System
Fanchao Qi | Lei Zhang | Yanhui Yang | Zhiyuan Liu | Maosong Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions. Reverse dictionaries have great practical value such as solving the tip-of-the-tongue problem and helping new language learners. There have been some online reverse dictionary systems, but they support English reverse dictionary queries only and their performance is far from perfect. In this paper, we present a new open-source online reverse dictionary system named WantWords (https://wantwords.thunlp.org/). It not only significantly outperforms other reverse dictionary systems on English reverse dictionary performance, but also supports Chinese and English-Chinese as well as Chinese-English cross-lingual reverse dictionary queries for the first time. Moreover, it has user-friendly front-end design which can help users find the words they need quickly and easily. All the code and data are available at https://github.com/thunlp/WantWords.


pdf bib
REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
Ming Jiang | Junjie Hu | Qiuyuan Huang | Lei Zhang | Jana Diesner | Jianfeng Gao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Popular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system’s overall effectiveness. This score is often not informative enough to indicate what specific errors are made by a given system. In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems. REO assesses the quality of captions from three perspectives: 1) Relevance to the ground truth, 2) Extraness of the content that is irrelevant to the ground truth, and 3) Omission of the elements in the images and human references. Experiments on three benchmark datasets demonstrate that our method achieves a higher consistency with human judgments and provides more intuitive evaluation results than alternative metrics.

pdf bib
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Ming Jiang | Qiuyuan Huang | Lei Zhang | Xin Wang | Pengchuan Zhang | Zhe Gan | Jana Diesner | Jianfeng Gao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions, potentially leading to biased evaluations because references may not fully cover the image content and natural language is inherently ambiguous. Building upon a machine-learned text-image grounding model, TIGEr allows to evaluate caption quality not only based on how well a caption represents image content, but also on how well machine-generated captions match human-generated captions. Our empirical tests show that TIGEr has a higher consistency with human judgments than alternative existing metrics. We also comprehensively assess the metric’s effectiveness in caption evaluation by measuring the correlation between human judgments and metric scores.


pdf bib
Topic Model for Identifying Suicidal Ideation in Chinese Microblog
Xiaolei Huang | Xin Li | Tianli Liu | David Chiu | Tingshao Zhu | Lei Zhang
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation


pdf bib
xLiD-Lexica: Cross-lingual Linked Data Lexica
Lei Zhang | Michael Färber | Achim Rettinger
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we introduce our cross-lingual linked data lexica, called xLiD-Lexica, which are constructed by exploiting the multilingual Wikipedia and linked data resources from Linked Open Data (LOD). We provide the cross-lingual groundings of linked data resources from LOD as RDF data, which can be easily integrated into the LOD data sources. In addition, we build a SPARQL endpoint over our xLiD-Lexica to allow users to easily access them using SPARQL query language. Multilingual and cross-lingual information access can be facilitated by the availability of such lexica, e.g., allowing for an easy mapping of natural language expressions in different languages to linked data resources from LOD. Many tasks in natural language processing, such as natural language generation, cross-lingual entity linking, text annotation and question answering, can benefit from our xLiD-Lexica.

pdf bib
RECSA: Resource for Evaluating Cross-lingual Semantic Annotation
Achim Rettinger | Lei Zhang | Daša Berović | Danijela Merkler | Matea Srebačić | Marko Tadić
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In recent years large repositories of structured knowledge (DBpedia, Freebase, YAGO) have become a valuable resource for language technologies, especially for the automatic aggregation of knowledge from textual data. One essential component of language technologies, which leverage such knowledge bases, is the linking of words or phrases in specific text documents with elements from the knowledge base (KB). We call this semantic annotation. In the same time, initiatives like Wikidata try to make those knowledge bases less language dependent in order to allow cross-lingual or language independent knowledge access. This poses a new challenge to semantic annotation tools which typically are language dependent and link documents in one language to a structured knowledge base grounded in the same language. Ultimately, the goal is to construct cross-lingual semantic annotation tools that can link words or phrases in one language to a structured knowledge database in any other language or to a language independent representation. To support this line of research we developed what we believe could serve as a gold standard Resource for Evaluating Cross-lingual Semantic Annotation (RECSA). We compiled a hand-annotated parallel corpus of 300 news articles in three languages with cross-lingual semantic groundings to the English Wikipedia and DBPedia. We hope that this new language resource, which is freely available, will help to establish a standard test set and methodology to comparatively evaluate cross-lingual semantic annotation technologies.

pdf bib
XLike Project Language Analysis Services
Xavier Carreras | Lluís Padró | Lei Zhang | Achim Rettinger | Zhixing Li | Esteban García-Cuesta | Željko Agić | Božo Bekavac | Blaz Fortuna | Tadej Štajner
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Semantic Annotation, Analysis and Comparison: A Multilingual and Cross-lingual Text Analytics Toolkit
Lei Zhang | Achim Rettinger
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
Extracting Resource Terms for Sentiment Analysis
Lei Zhang | Bing Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Identifying Noun Product Features that Imply Opinions
Lei Zhang | Bing Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies


pdf bib
Analyse d’opinion : annotation sémantique de textes chinois
Lei Zhang | Stéphane Ferrari
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Notre travail concerne l’analyse automatique des énoncés d’opinion en chinois. En nous inspirant de la théorie linguistique de l’Appraisal, nous proposons une méthode fondée sur l’usage de lexiques et de règles locales pour déterminer les caractéristiques telles que la Force (intensité), le Focus (prototypicalité) et la polarité de tels énoncés. Nous présentons le modèle et sa mise en oeuvre sur un corpus journalistique. Si pour la détection d’énoncés d’opinion, la précision est bonne (94 %), le taux de rappel (67 %) pose cependant des questions sur l’enrichissement des ressources actuelles.

pdf bib
Distributional Similarity vs. PU Learning for Entity Set Expansion
Xiao-Li Li | Lei Zhang | Bing Liu | See-Kiong Ng
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Extracting and Ranking Product Features in Opinion Documents
Lei Zhang | Bing Liu | Suk Hwan Lim | Eamonn O’Brien-Strain
Coling 2010: Posters


pdf bib
LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression
Jia-Cheng Guo | Bao-Liang Lu | Zhiwei Li | Lei Zhang
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1


pdf bib
Chinese Named Entity Identification Using Class-based Language Model
Jian Sun | Jianfeng Gao | Lei Zhang | Ming Zhou | Changning Huang
COLING 2002: The 19th International Conference on Computational Linguistics


pdf bib
Automatic Detecting/Correcting Errors in Chinese Text by an Approximate Word-Matching Algorithm
Lei Zhang | Ming Zhou | Changning Huang | Haihua Pan
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics