Zihao Wang


2024

pdf bib
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
Qineng Wang | Zihao Wang | Ying Su | Hanghang Tong | Yangqiu Song
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same best performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observed that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion. Our code can be found in https://github.com/HKUST-KnowComp/LLM-discussion.

pdf bib
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du | Yibo Wang | Wenting Zhao | Zhongfen Deng | Shuaiqi Liu | Renze Lou | Henry Peng Zou | Pranav Narayanan Venkit | Nan Zhang | Mukund Srinath | Haoran Ranran Zhang | Vipul Gupta | Yinghui Li | Tao Li | Fei Wang | Qin Liu | Tianlin Liu | Pengzhi Gao | Congying Xia | Chen Xing | Cheng Jiayang | Zhaowei Wang | Ying Su | Raj Sanjay Shah | Ruohao Guo | Jing Gu | Haoran Li | Kangda Wei | Zihao Wang | Lu Cheng | Surangika Ranathunga | Meng Fang | Jie Fu | Fei Liu | Ruihong Huang | Eduardo Blanco | Yixin Cao | Rui Zhang | Philip S. Yu | Wenpeng Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Claim: This work is not advocating the use of LLMs for paper (meta-)reviewing. Instead, wepresent a comparative analysis to identify and distinguish LLM activities from human activities. Two research goals: i) Enable better recognition of instances when someone implicitly uses LLMs for reviewing activities; ii) Increase community awareness that LLMs, and AI in general, are currently inadequate for performing tasks that require a high level of expertise and nuanced judgment.This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

pdf bib
Generate-on-Graph: Treat LLM as both Agent and KG for Incomplete Knowledge Graph Question Answering
Yao Xu | Shizhu He | Jiabei Chen | Zihao Wang | Yangqiu Song | Hanghang Tong | Guang Liu | Jun Zhao | Kang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

To address the issues of insufficient knowledge and hallucination in Large Language Models (LLMs), numerous studies have explored integrating LLMs with Knowledge Graphs (KGs). However, these methods are typically evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where all factual triples required for each question are entirely covered by the given KG. In such cases, LLMs primarily act as an agent to find answer entities within the KG, rather than effectively integrating the internal knowledge of LLMs and external knowledge sources such as KGs. In fact, KGs are often incomplete to cover all the knowledge required to answer questions. To simulate these real-world scenarios and evaluate the ability of LLMs to integrate internal and external knowledge, we propose leveraging LLMs for QA under Incomplete Knowledge Graph (IKGQA), where the provided KG lacks some of the factual triples for each question, and construct corresponding datasets. To handle IKGQA, we propose a training-free method called Generate-on-Graph (GoG), which can generate new factual triples while exploring KGs. Specifically, GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA. Experimental results on two datasets demonstrate that our GoG outperforms all previous methods.

2023

pdf bib
Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport
Zihao Wang | Weizhi Fei | Hang Yin | Yangqiu Song | Ginny Wong | Simon See
Findings of the Association for Computational Linguistics: ACL 2023

Answering complex queries on knowledge graphs is important but particularly challenging because of the data incompleteness. Query embedding methods address this issue by learningbased models and simulating logical reasoning with set operators. Previous works focus on specific forms of embeddings, but scoring functions between embeddings are underexplored. In contrast to existing scorning functions motivated by local comparison or global transport, this work investigates the local and global trade-off with unbalanced optimal transport theory. Specifically, we embed sets as bounded measures in R endowed with a scoring function motivated by the Wasserstein-Fisher-Rao metric. Such a design also facilitates closed-form set operators in the embedding space. Moreover, we introduce a convolution-based algorithm for linear time computation and a block diagonal kernel to enforce the trade-off. Results show that WFRE is capable of outperforming existing query embedding methods on standard datasets, evaluation sets with combinatorially complex queries, and hierarchical knowledge graphs. Ablation study shows that finding a better local and global trade-off is essential for performance improvement.

2022

pdf bib
Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics
Zihao Wang | Jiaheng Dou | Yong Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies prove the effectiveness of our approaches.

pdf bib
SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising
Kuan Xu | Yongbo Wang | Yongliang Wang | Zihao Wang | Zujie Wen | Yang Dong
Findings of the Association for Computational Linguistics: NAACL 2022

On the WikiSQL benchmark, most methods tackle the challenge of text-to-SQL with predefined sketch slots and build sophisticated sub-tasks to fill these slots. Though achieving promising results, these methods suffer from over-complex model structure. In this paper, we present a simple yet effective approach that enables auto-regressive sequence-to-sequence model to robust text-to-SQL generation. Instead of formulating the task of text-to-SQL as slot-filling, we propose to train sequence-to-sequence model with Schema-aware Denoising (SeaD), which consists of two denoising objectives that train model to either recover input or predict output from two novel erosion and shuffle noises. These model-agnostic denoising objectives act as the auxiliary tasks for structural data modeling during sequence-to-sequence generation. In addition, we propose a clause-sensitive execution guided (EG) decoding strategy to overcome the limitation of EG decoding for generative model. The experiments show that the proposed method improves the performance of sequence-to-sequence model in both schema linking and grammar correctness and establishes new state-of-the-art on WikiSQL benchmark. Our work indicates that the capacity of sequence-to-sequence model for text-to-SQL may have been under-estimated and could be enhanced by specialized denoising task.

pdf bib
Query2Particles: Knowledge Graph Reasoning with Particle Embeddings
Jiaxin Bai | Zihao Wang | Hongming Zhang | Yangqiu Song
Findings of the Association for Computational Linguistics: NAACL 2022

Answering complex logical queries on incomplete knowledge graphs (KGs) with missing edges is a fundamental and important task for knowledge graph reasoning. The query embedding method is proposed to answer these queries by jointly encoding queries and entities to the same embedding space. Then the answer entities are selected according to the similarities between the entity embeddings and the query embedding. As the answers to a complex query are obtained from a combination of logical operations over sub-queries, the embeddings of the answer entities may not always follow a uni-modal distribution in the embedding space. Thus, it is challenging to simultaneously retrieve a set of diverse answers from the embedding space using a single and concentrated query representation such as a vector or a hyper-rectangle. To better cope with queries with diversified answers, we propose Query2Particles (Q2P), a complex KG query answering method. Q2P encodes each query into multiple vectors, named particle embeddings. By doing so, the candidate answers can be retrieved from different areas over the embedding space using the maximal similarities between the entity embeddings and any of the particle embeddings. Meanwhile, the corresponding neural logic operations are defined to support its reasoning over arbitrary first-order logic queries. The experiments show that Query2Particles achieves state-of-the-art performance on the complex query answering tasks on FB15k, FB15K-237, and NELL knowledge graphs.

pdf bib
MICO: A Multi-alternative Contrastive Learning Framework for Commonsense Knowledge Representation
Ying Su | Zihao Wang | Tianqing Fang | Hongming Zhang | Yangqiu Song | Tong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Commonsense reasoning tasks such as commonsense knowledge graph completion and commonsense question answering require powerful representation learning. In this paper, we propose to learn commonsense knowledge representation by MICO, a Multi-alternative contrastIve learning framework on COmmonsense knowledge graphs (MICO). MICO generates the commonsense knowledge representation by contextual interaction between entity nodes and relations with multi-alternative contrastive learning. In MICO, the head and tail entities in an (h,r,t) knowledge triple are converted to two relation-aware sequence pairs (a premise and an alternative) in the form of natural language. Semantic representations generated by MICO can benefit the following two tasks by simply comparing the similarity score between the representations: 1) zero-shot commonsense question answering tasks; 2) inductive commonsense knowledge graph completion tasks. Extensive experiments show the effectiveness of our method.

pdf bib
A Neural-Symbolic Approach to Natural Language Understanding
Zhixuan Liu | Zihao Wang | Yuan Lin | Hang Li
Findings of the Association for Computational Linguistics: EMNLP 2022

Deep neural networks, empowered by pre-trained language models, have achieved remarkable results in natural language understanding (NLU) tasks. However, their performances can drastically deteriorate when logical reasoning is needed. This is because NLU in principle depends on not only analogical reasoning, which deep neural networks are good at, but also logical reasoning. According to the dual-process theory, analogical reasoning and logical reasoning are respectively carried out by System 1 and System 2 in the human brain. Inspired by the theory, we present a novel framework for NLU called Neural-Symbolic Processor (NSP), which performs analogical reasoning based on neural processing and logical reasoning based on both neural and symbolic processing. As a case study, we conduct experiments on two NLU tasks, question answering (QA) and natural language inference (NLI), when numerical reasoning (a type of logical reasoning) is necessary. The experimental results show that our method significantly outperforms state-of-the-art methods in both tasks.

2020

pdf bib
A Relaxed Matching Procedure for Unsupervised BLI
Xu Zhao | Zihao Wang | Yong Zhang | Hao Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently unsupervised Bilingual Lexicon Induction(BLI) without any parallel corpus has attracted much research interest. One of the crucial parts in methods for the BLI task is the matching procedure. Previous works impose a too strong constraint on the matching and lead to many counterintuitive translation pairings. Thus We propose a relaxed matching procedure to find a more precise matching between two languages. We also find that aligning source and target language embedding space bidirectionally will bring significant improvement. We follow the previous iterative framework to conduct experiments. Results on standard benchmark demonstrate the effectiveness of our proposed method, which substantially outperforms previous unsupervised methods.

pdf bib
Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction
Xu Zhao | Zihao Wang | Hao Wu | Yong Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Semi-supervision is a promising paradigm for Bilingual Lexicon Induction (BLI) with limited annotations. However, previous semisupervised methods do not fully utilize the knowledge hidden in annotated and nonannotated data, which hinders further improvement of their performance. In this paper, we propose a new semi-supervised BLI framework to encourage the interaction between the supervised signal and unsupervised alignment. We design two message-passing mechanisms to transfer knowledge between annotated and non-annotated data, named prior optimal transport and bi-directional lexicon update respectively. Then, we perform semi-supervised learning based on a cyclic or a parallel parameter feeding routine to update our models. Our framework is a general framework that can incorporate any supervised and unsupervised BLI methods based on optimal transport. Experimental results on MUSE and VecMap datasets show significant improvement of our models. Ablation study also proves that the two-way interaction between the supervised signal and unsupervised alignment accounts for the gain of the overall performance. Results on distant language pairs further illustrate the advantage and robustness of our proposed method.

2019

pdf bib
Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion
Zihao Wang | Kwunping Lai | Piji Li | Lidong Bing | Wai Lam
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

For large-scale knowledge graphs (KGs), recent research has been focusing on the large proportion of infrequent relations which have been ignored by previous studies. For example few-shot learning paradigm for relations has been investigated. In this work, we further advocate that handling uncommon entities is inevitable when dealing with infrequent relations. Therefore, we propose a meta-learning framework that aims at handling infrequent relations with few-shot learning and uncommon entities by using textual descriptions. We design a novel model to better extract key information from textual descriptions. Besides, we also develop a novel generative model in our framework to enhance the performance by generating extra triplets during the training stage. Experiments are conducted on two datasets from real-world KGs, and the results show that our framework outperforms previous methods when dealing with infrequent relations and their accompanying uncommon entities.

2018

pdf bib
Responding E-commerce Product Questions via Exploiting QA Collections and Reviews
Qian Yu | Wai Lam | Zihao Wang
Proceedings of the 27th International Conference on Computational Linguistics

Providing instant responses for product questions in E-commerce sites can significantly improve satisfaction of potential consumers. We propose a new framework for automatically responding product questions newly posed by users via exploiting existing QA collections and review collections in a coordinated manner. Our framework can return a ranked list of snippets serving as the automated response for a given question, where each snippet can be a sentence from reviews or an existing question-answer pair. One major subtask in our framework is question-based response review ranking. Learning for response review ranking is challenging since there is no labeled response review available. The collection of existing QA pairs are exploited as distant supervision for learning to rank responses. With proposed distant supervision paradigm, the learned response ranking model makes use of the knowledge in the QA pairs and the corresponding retrieved review lists. Extensive experiments on datasets collected from a real-world commercial E-commerce site demonstrate the effectiveness of our proposed framework.

2017

pdf bib
Deep Recurrent Generative Decoder for Abstractive Text Summarization
Piji Li | Wai Lam | Lidong Bing | Zihao Wang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new framework for abstractive text summarization based on a sequence-to-sequence oriented encoder-decoder model equipped with a deep recurrent generative decoder (DRGN). Latent structure information implied in the target summaries is learned based on a recurrent latent random model for improving the summarization quality. Neural variational inference is employed to address the intractable posterior inference for the recurrent latent variables. Abstractive summaries are generated based on both the generative latent variables and the discriminative deterministic states. Extensive experiments on some benchmark datasets in different languages show that DRGN achieves improvements over the state-of-the-art methods.