Yongliang Shen - ACL Anthology

Yongliang Shen

2025

STaR-SQL: Self-Taught Reasoner for Text-to-SQL
Mingqian He | Yongliang Shen | Wenqi Zhang | Qiuying Peng | Jun Wang | Weiming Lu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generating step-by-step “chain-of-thought” rationales has proven effective for improving the performance of large language models on complex reasoning tasks. However, applying such techniques to structured tasks, such as text-to-SQL, remains largely unexplored. In this paper, we introduce Self-Taught Reasoner for text-to-SQL (STaR-SQL), a novel approach that reframes SQL query generation as a reasoning-driven process. Our method prompts the LLM to produce detailed reasoning steps for SQL queries and fine-tunes it on rationales that lead to correct outcomes. Unlike traditional methods, STaR-SQL dedicates additional test-time computation to reasoning, thereby positioning LLMs as spontaneous reasoners rather than mere prompt-based agents. To further scale the inference process, we incorporate an outcome-supervised reward model (ORM) as a verifier, which enhances SQL query accuracy. Experimental results on the challenging Spider benchmark demonstrate that STaR-SQL significantly improves text-to-SQL performance, achieving an execution accuracy of 86.6%. This surpasses a few-shot baseline by 31.6% and a baseline fine-tuned to predict answers directly by 18.0%. Additionally, STaR-SQL outperforms agent-like prompting methods that leverage more powerful yet closed-source models such as GPT-4. These findings underscore the potential of reasoning-augmented training for structured tasks and open the door to extending self-improving reasoning models to text-to-SQL generation and beyond.

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
Siyu Yuan | Kaitao Song | Jiangjie Chen | Xu Tan | Yongliang Shen | Kan Ren | Dongsheng Li | Deqing Yang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

There has been a rising interest in utilizing tools in applications of autonomous agents based on large language models (LLMs) to address intricate real-world tasks. To develop LLMbased agents, it usually requires LLMs to understand many tool functions from different tool documentations. However, these documentations could be diverse, redundant, or incomplete, which immensely affects the capability of LLMs in using tools. Current LLMs exhibit satisfactory instruction-following capabilities based on instruction-following fine-tuning process. Motivated by this, in this paper, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction to fully leverage instruction-following capabilities of LLMs for easier tool usage. EASYTOOL purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i.e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents. Extensive experiments on multiple different tasks demonstrate that EASYTOOL can significantly reduce token consumption and improve the performance of LLM-based agents on tool utilization in real-world scenarios. Our code is available in supplemental materials. Our code is available at https://github.com/microsoft/JARVIS/tree/main/easytool.

AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang | Yongliang Shen | Zhe Zheng | Linjuan Wu | Wenqi Zhang | Yuchen Yan | Qiuying Peng | Jun Wang | Weiming Lu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have demonstrated remarkable capabilities in tool learning. In real-world scenarios, user queries are often ambiguous and incomplete, requiring effective clarification. However, existing interactive clarification approaches face two critical limitations: reliance on manually constructed datasets, which inherently constrains training data scale and diversity, and lack of error correction mechanisms during multi-turn clarification, leading to error accumulation that compromises both accuracy and efficiency. We present AskToAct, which addresses these challenges by exploiting the structural mapping between queries and their tool invocation solutions. Our key insight is that tool parameters naturally represent explicit user intents. By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data. We further enhance model robustness through error-correction pairs and selective masking, enabling dynamic error detection during clarification interactions. Comprehensive experiments demonstrate that AskToAct significantly outperforms existing approaches, achieving above 57% accuracy in recovering critical unspecified intents and enhancing clarification efficiency by an average of 10.46% while maintaining high accuracy in tool invocation. Our framework exhibits robust performance across different model architectures and successfully generalizes to entirely unseen APIs without additional training, achieving performance comparable to GPT-4o with substantially fewer computational resources.

DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL
Haoyuan Ma | Yongliang Shen | Hengwei Liu | Wenqi Zhang | Haolei Xu | Qiuying Peng | Jun Wang | Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent text-to-SQL systems powered by large language models (LLMs) have demonstrated remarkable performance in translating natural language queries into SQL.However, these systems often struggle with complex database structures and domain-specific queries, as they primarily focus on enhancing logical reasoning and SQL syntax while overlooking the critical need for comprehensive database understanding.To address this limitation, we propose DB-Explore, a novel framework that systematically aligns LLMs with database knowledge through automated exploration and instruction synthesis.DB-Explore constructs database graphs to capture complex relational schemas, leverages GPT-4 to systematically mine structural patterns and semantic knowledge, and synthesizes instructions to distill this knowledge for efficient fine-tuning of LLMs.Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation, bridging the gap between database structures and language models.Experiments conducted on the SPIDER and BIRD benchmarks validate the effectiveness of DB-Explore, achieving an execution accuracy of 67.0% on BIRD and 87.8% on SPIDER. Notably, our open‐source implementation based on Qwen2.5‐Coder‐7B achieves state‐of‐the‐art results at minimal computational cost, outperforming several GPT‐4‐driven Text‐to‐SQL systems.

Scaling LLMs’ Social Reasoning: Sprinkle Cognitive “Aha Moment” into Fundamental Long-thought Logical Capabilities
Guiyang Hou | Wenqi Zhang | Zhe Zheng | Yongliang Shen | Weiming Lu
Findings of the Association for Computational Linguistics: ACL 2025

Humans continually engage in reasoning about others’ mental states, a capability known as Theory of Mind (ToM), is essential for social interactions. While this social reasoning capability emerges naturally in human cognitive development, how has the social reasoning capability of Large Language Models (LLMs) evolved during their development process? Various datasets have been proposed to assess LLMs’ social reasoning capabilities, but each is designed with a distinct focus, and none have explored how models’ social reasoning capabilities evolve during model size scaling or reasoning tokens scaling. In light of this, we optimize the evaluation of LLMs’ social reasoning from both data and model perspectives, constructing progressively difficult levels of social reasoning data and systematically exploring how LLMs’ social reasoning capabilities evolve. Furthermore, through an in-depth analysis of DeepSeek-R1’s reasoning trajectories, we identify notable cognitive “Aha Moment” and the reasons for its reasoning errors. Experiments reveal that long-thought logical capabilities and cognitive thinking are key to scaling LLMs’ social reasoning capabilities. By equipping the Qwen2.5-32B-Instruct model with long-thought logical capabilities and cognitive thinking, we achieve an improvement of 19.0 points, attaining social reasoning performance comparable to o1-preview model.

Logic: Long-form Outline Generation via Imitative and Critical Self-refinement
Hengwei Liu | Yongliang Shen | Zhe Zheng | Haoyuan Ma | Xingyu Wu | Yin Zhang | Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2025

Long-form outline generation for expository articles requires both comprehensive knowledge coverage and logical coherence, which is essential for creating detailed Wikipedia-like content. However, existing methods face critical limitations: outlines generated in the pre-writing stage often have low knowledge density and lack detail, while retrieval-augmented approaches struggle to maintain logical coherence across retrieved information. Additionally, unlike human writers who can iteratively improve through peer feedback and reference similar topics, current approaches lack effective mechanisms for systematic outline refinement. To address these challenges, we propose Logic, a Long-form Outline Generation system via Imitative and Critical self-refinement that mimics human writers’ refinement process. Logic establishes a coherent planning framework and structured knowledge base, learns from similar topic outlines through imitation, and continuously improves through model-based critique. Experiments on FreshWiki and our dataset WikiOutline show that, compared to the best baseline, Logic’s long-form outlines are more organized (with increases of 22.85% and 21.65% respectively) and more logically coherent (with increases of 16.19% and 12.24% respectively). Human evaluation further validates Logic’s effectiveness in generating comprehensive and well-structured long-form outlines.

2024

TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models’ Theory-of-Mind
Guiyang Hou | Wenqi Zhang | Yongliang Shen | Linjuan Wu | Weiming Lu
Findings of the Association for Computational Linguistics: ACL 2024

Theory of Mind (ToM)—the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction. Although ToM comes naturally to humans, it poses a significant challenge to even the most advanced Large Language Models (LLMs). Due to the complex logical chains in ToM reasoning, especially in higher-order ToM questions, simply utilizing reasoning methods like Chain of Thought (CoT) will not improve the ToM capabilities of LLMs. We present TimeToM, which constructs a temporal space and uses it as the foundation to improve the ToM capabilities of LLMs in multiple scenarios. Specifically, within the temporal space, we construct Temporal Belief State Chain (TBSC) for each character and inspired by the cognition perspective of the social world model, we divide TBSC into self-world beliefs and social world beliefs, aligning with first-order ToM (first-order beliefs) and higher-order ToM (higher-order beliefs) questions, respectively. Moreover, we design a novel tool-belief solver that, by considering belief communication between characters in temporal space, can transform a character’s higher-order beliefs into another character’s first-order beliefs under belief communication period.

Learning Global Controller in Latent Space for Parameter-Efficient Fine-Tuning
Zeqi Tan | Yongliang Shen | Xiaoxia Cheng | Chang Zong | Wenqi Zhang | Jian Shao | Weiming Lu | Yueting Zhuang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While large language models (LLMs) have showcased remarkable prowess in various natural language processing tasks, their training costs are exorbitant. Consequently, a plethora of parameter-efficient fine-tuning methods have emerged to tailor large models for downstream tasks, including low-rank training. Recent approaches either amalgamate existing fine-tuning methods or dynamically adjust rank allocation. Nonetheless, these methods continue to grapple with issues like local optimization, inability to train with full rank and lack of focus on specific tasks. In this paper, we introduce an innovative parameter-efficient method for exploring optimal solutions within latent space. More specifically, we introduce a set of latent units designed to iteratively extract input representations from LLMs, continuously refining informative features that enhance downstream task performance. Due to the small and independent nature of the latent units in relation to input size, this significantly reduces training memory requirements. Additionally, we employ an asymmetric attention mechanism to facilitate bidirectional interaction between latent units and freezed LLM representations, thereby mitigating issues associated with non-full-rank training. Furthermore, we apply distillation over hidden states during the interaction, which guarantees a trimmed number of trainable parameters.Experimental results demonstrate that our approach achieves state-of-the-art performance on a range of natural language understanding, generation and reasoning tasks.

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Wenqi Zhang | Yongliang Shen | Linjuan Wu | Qiuying Peng | Jun Wang | Yueting Zhuang | Weiming Lu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM’s response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM’s intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Wenqi Zhang | Ke Tang | Hai Wu | Mengna Wang | Yongliang Shen | Guiyang Hou | Zeqi Tan | Peng Li | Yueting Zhuang | Weiming Lu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, “fine-tuning” its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold’em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.

Progressive Tuning: Towards Generic Sentiment Abilities for Large Language Models
Guiyang Hou | Yongliang Shen | Weiming Lu
Findings of the Association for Computational Linguistics: ACL 2024

Understanding sentiment is arguably an advanced and important capability of AI agents in the physical world. In previous works, many efforts have been devoted to individual sentiment subtasks, without considering interrelated sentiment knowledge among these subtasks. Although some recent works model multiple sentiment subtasks in a unified manner, they merely simply combine these subtasks without deeply exploring the hierarchical relationships among subtasks. In this paper, we introduce GSA-7B, an open-source large language model specific to the sentiment domain. Specifically, we deeply explore the hierarchical relationships between sentiment subtasks, proposing progressive sentiment reasoning benchmark and progressive task instructions. Subsequently, we use Llama2-7B as the backbone model and propose parameter-efficient progressive tuning paradigm which is implemented by constructing chain of LoRA, resulting in the creation of GSA-7B. Experimental results show that GSA-7B as a unified model performs well across all datasets in the progressive sentiment reasoning benchmark. Additionally, under the few-shot setting, GSA-7B also exhibits good generalization ability for sentiment subtasks and datasets that were not encountered during its training phase.

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He | Yongliang Shen | Wenqi Zhang | Zeqi Tan | Weiming Lu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales. Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We empirically evaluate Tree-PLV across a range of arithmetic and commonsense reasoning tasks, where it significantly outperforms existing benchmarks. For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67.55% → 82.79%), MATH (17.00% → 26.80%), CSQA (68.14% → 72.97%), and StrategyQA (82.86% → 83.25%). Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process.

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang | Zhenglin Cheng | Yuanyu He | Mengna Wang | Yongliang Shen | Zeqi Tan | Guiyang Hou | Mingqian He | Yanna Ma | Weiming Lu | Yueting Zhuang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios. Our strategy effortlessly creates a multimodal benchmark with 11,193 instructions for eight visual scenarios: charts, tables, simulated maps, dashboards, flowcharts, relation graphs, floor plans, and visual puzzles. This benchmark, constructed with simple lines and geometric elements, exposes the shortcomings of most advanced LMMs like GPT-4V and Llava in abstract image understanding, spatial relations reasoning, and visual element induction. Besides, to verify the quality of our synthetic data, we fine-tune an LMM using 62,476 synthetic chart, table and road map instructions. The results demonstrate improved chart understanding and map navigation performance, and also demonstrate potential benefits for other visual reasoning tasks.

Insert or Attach: Taxonomy Completion via Box Embedding
Wei Xue | Yongliang Shen | Wenqi Ren | Jietian Guo | Shiliang Pu | Weiming Lu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Taxonomy completion, enriching existing taxonomies by inserting new concepts as parents or attaching them as children, has gained significant interest. Previous approaches embed concepts as vectors in Euclidean space, which makes it difficult to model asymmetric relations in taxonomy. In addition, they introduce pseudo-leaves to convert attachment cases into insertion cases, leading to an incorrect bias in network learning dominated by numerous pseudo-leaves. Addressing these, our framework, TaxBox, leverages box containment and center closeness to design two specialized geometric scorers within the box embedding space. These scorers are tailored for insertion and attachment operations and can effectively capture intrinsic relationships between concepts by optimizing on a granular box constraint loss. We employ a dynamic ranking loss mechanism to balance the scores from these scorers, allowing adaptive adjustments of insertion and attachment scores. Experiments on four real-world datasets show that TaxBox significantly outperforms previous methods, yielding substantial improvements over prior methods in real-world datasets, with average performance boosts of 6.7%, 34.9%, and 51.4% in MRR, Hit@1, and Prec@1, respectively.

2023

An Expression Tree Decoding Strategy for Mathematical Equation Generation
Wenqi Zhang | Yongliang Shen | Qingpeng Nong | Zeqi Tan | Yanna Ma | Weiming Lu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Generating mathematical equations from natural language requires an accurate understanding of the relations among math expressions. Existing approaches can be broadly categorized into token-level and expression-level generation. The former treats equations as a mathematical language, sequentially generating math tokens. Expression-level methods generate each expression one by one. However, each expression represents a solving step, and there naturally exist parallel or dependent relations between these steps, which are ignored by current sequential methods. Therefore, we integrate tree structure into the expression-level generation and advocate an expression tree decoding strategy. To generate a tree with expression as its node, we employ a layer-wise parallel decoding strategy: we decode multiple independent expressions (leaf nodes) in parallel at each layer and repeat parallel decoding layer by layer to sequentially generate these parent node expressions that depend on others. Besides, a bipartite matching algorithm is adopted to align multiple predictions with annotations for each layer. Experiments show our method outperforms other baselines, especially for these equations with complex structures.

Enhancing Emotion Recognition in Conversation via Multi-view Feature Alignment and Memorization
Guiyang Hou | Yongliang Shen | Wenqi Zhang | Wei Xue | Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2023

Emotion recognition in conversation (ERC) has attracted increasing attention in natural language processing community. Previous work commonly first extract semantic-view features via fine-tuning PLMs, then models context-view features based on the obtained semantic-view features by various graph neural networks. However, it is difficult to fully model interaction between utterances simply through a graph neural network and the features at semantic-view and context-view are not well aligned. Moreover, the previous parametric learning paradigm struggle to learn the patterns of tail class given fewer instances. To this end, we treat the pre-trained conversation model as a prior knowledge base and from which we elicit correlations between utterances by a probing procedure. And we adopt supervised contrastive learning to align semantic-view and context-view features, these two views of features work together in a complementary manner, contributing to ERC from distinct perspectives. Meanwhile, we propose a new semi-parametric paradigm of inferencing through memorization to solve the recognition problem of tail class samples. We consistently achieve state-of-the-art results on four widely used benchmarks. Extensive experiments demonstrate the effectiveness of our proposed multi-view feature alignment and memorization.

DiffusionNER: Boundary Diffusion for Named Entity Recognition
Yongliang Shen | Kaitao Song | Xu Tan | Dongsheng Li | Weiming Lu | Yueting Zhuang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans. During training, DiffusionNER gradually adds noises to the golden entity boundaries by a fixed forward diffusion process and learns a reverse diffusion process to recover the entity boundaries. In inference, DiffusionNER first randomly samples some noisy spans from a standard Gaussian distribution and then generates the named entities by denoising them with the learned reverse diffusion process. The proposed boundary-denoising diffusion process allows progressive refinement and dynamic sampling of entities, empowering DiffusionNER with efficient and flexible entity generation capability. Experiments on multiple flat and nested NER datasets demonstrate that DiffusionNER achieves comparable or even better performance than previous state-of-the-art models.

A Set Prediction Network For Extractive Summarization
Xiaoxia Cheng | Yongliang Shen | Weiming Lu
Findings of the Association for Computational Linguistics: ACL 2023

Extractive summarization focuses on extracting salient sentences from the source document and incorporating them in the summary without changing their wording or structure. The naive approach for extractive summarization is sentence classification, which makes independent binary decisions for each sentence, resulting in the model cannot detect the dependencies between sentences in the summary. Recent approaches introduce an autoregressive decoder to detect redundancy relationship between sentences by step-by-step sentence selection, but bring train-inference gap. To address these issues, we formulate extractive summarization as a salient sentence set recognition task. To solve the sentence set recognition task, we propose a set prediction network (SetSum), which sets up a fixed set of learnable queries to extract the entire sentence set of the summary, while capturing the dependencies between them.Different from previous methods with an auto-regressive decoder, we employ a non-autoregressive decoder to predict the sentences within the summary in parallel during both the training and inference process, which eliminates the train-inference gap. Experimental results on both single-document and multi-document extracted summary datasets show that our approach outperforms previous state-of-the-art models.

MProto: Multi-Prototype Network with Denoised Optimal Transport for Distantly Supervised Named Entity Recognition
Shuhui Wu | Yongliang Shen | Zeqi Tan | Wenqi Ren | Jietian Guo | Shiliang Pu | Weiming Lu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Distantly supervised named entity recognition (DS-NER) aims to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus. However, distant annotations are noisy and degrade the performance of NER models. In this paper, we propose a noise-robust prototype network named MProto for the DS-NER task. Different from previous prototype-based NER methods, MProto represents each entity type with multiple prototypes to characterize the intra-class variance among entity representations. To optimize the classifier, each token should be assigned an appropriate ground-truth prototype and we consider such token-prototype assignment as an optimal transport (OT) problem. Furthermore, to mitigate the noise from incomplete labeling, we propose a novel denoised optimal transport (DOT) algorithm. Specifically, we utilize the assignment result between *Other* class tokens and all prototypes to distinguish unlabeled entity tokens from true negatives. Experiments on several DS-NER benchmarks demonstrate that our MProto achieves state-of-the-art performance. The source code is now available on Github.

PromptNER: Prompt Locating and Typing for Named Entity Recognition
Yongliang Shen | Zeqi Tan | Shuhui Wu | Wenqi Zhang | Rongsheng Zhang | Yadong Xi | Weiming Lu | Yueting Zhuang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompt learning is a new paradigm for utilizing pre-trained language models and has achieved great success in many tasks. To adopt prompt learning in the NER task, two kinds of methods have been explored from a pair of symmetric perspectives, populating the template by enumerating spans to predict their entity types or constructing type-specific prompts to locate entities. However, these methods not only require a multi-round prompting manner with a high time overhead and computational cost, but also require elaborate prompt templates, that are difficult to apply in practical scenarios. In this paper, we unify entity locating and entity typing into prompt learning, and design a dual-slot multi-prompt template with the position slot and type slot to prompt locating and typing respectively. Multiple prompts can be input to the model simultaneously, and then the model extracts all entities by parallel predictions on the slots. To assign labels for the slots during training, we design a dynamic template filling mechanism that uses the extended bipartite graph matching between prompts and the ground-truth entities. We conduct experiments in various settings, including resource-rich flat and nested NER datasets and low-resource in-domain and cross-domain datasets. Experimental results show that the proposed model achieves a significant performance improvement, especially in the cross-domain few-shot setting, which outperforms the state-of-the-art model by +7.7% on average.

2022

Query-based Instance Discrimination Network for Relational Triple Extraction
Zeqi Tan | Yongliang Shen | Xuming Hu | Wenqi Zhang | Xiaoxia Cheng | Weiming Lu | Yueting Zhuang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Joint entity and relation extraction has been a core task in the field of information extraction. Recent approaches usually consider the extraction of relational triples from a stereoscopic perspective, either learning a relation-specific tagger or separate classifiers for each relation type. However, they still suffer from error propagation, relation redundancy and lack of high-level connections between triples. To address these issues, we propose a novel query-based approach to construct instance-level representations for relational triples. By metric-based comparison between query embeddings and token embeddings, we can extract all types of triples in one step, thus eliminating the error propagation problem. In addition, we learn the instance-level representation of relational triples via contrastive learning. In this way, relational triples can not only enclose rich class-level semantics but also access to high-order global connections. Experimental results show that our proposed method achieves the state of the art on five widely used benchmarks.

Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem
Wenqi Zhang | Yongliang Shen | Yanna Ma | Xiaoxia Cheng | Zeqi Tan | Qingpeng Nong | Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2022

Math word problem solver requires both precise relation reasoning about quantities in the text and reliable generation for the diverse equation. Current sequence-to-tree or relation extraction methods regard this only from a fixed view, struggling to simultaneously handle complex semantics and diverse equations. However, human solving naturally involves two consistent reasoning views: top-down and bottom-up, just as math equations also can be expressed in multiple equivalent forms: pre-order and post-order. We propose a multi-view consistent contrastive learning for a more complete semantics-to-equation mapping. The entire process is decoupled into two independent but consistent views: top-down decomposition and bottom-up construction, and the two reasoning views are aligned in multi-granularity for consistency, enhancing global generation and precise reasoning. Experiments on multiple datasets across two languages show our approach significantly outperforms the existing baselines, especially on complex problems. We also show after consistent alignment, multi-view can absorb the merits of both views and generate more diverse results consistent with the mathematical laws.

Minimally-Supervised Relation Induction from Pre-trained Language Model
Lu Sun | Yongliang Shen | Weiming Lu
Findings of the Association for Computational Linguistics: NAACL 2022

Relation Induction is a very practical task in Natural Language Processing (NLP) area. In practical application scenarios, people want to induce more entity pairs having the same relation from only a few seed entity pairs. Thus, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting where only a couple of seed entity pairs per relation are provided. Although the conventional relation induction methods have made some success, their performance depends heavily on the quality of word embeddings. The great success of Pre-trained Language Models, such as BERT, changes the NLP area a lot, and they are proven to be able to better capture relation knowledge. In this paper, we propose a novel method to induce relation with BERT under the minimally-supervised setting. Specifically, we firstly extract proper templates from the corpus by using the mask-prediction task in BERT to build pseudo-sentences as the context of entity pairs. Then we use BERT attention weights to better represent the pseudo-sentences. In addition, We also use the IntegratedGradient of entity pairs to iteratively select better templates further. Finally, with the high-quality pseudo-sentences, we can train a better classifier for relation induction. Experiments onGoogle Analogy Test Sets (GATS), Bigger Analogy TestSet (BATS) and DiffVec demonstrate that our proposed method achieves state-of-the-art performance.

De-Bias for Generative Extraction in Unified NER Task
Shuai Zhang | Yongliang Shen | Zeqi Tan | Yiquan Wu | Weiming Lu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Named entity recognition (NER) is a fundamental task to recognize specific types of entities from a given sentence. Depending on how the entities appear in the sentence, it can be divided into three subtasks, namely, Flat NER, Nested NER, and Discontinuous NER. Among the existing approaches, only the generative model can be uniformly adapted to these three subtasks. However, when the generative model is applied to NER, its optimization objective is not consistent with the task, which makes the model vulnerable to the incorrect biases. In this paper, we analyze the incorrect biases in the generation process from a causality perspective and attribute them to two confounders: pre-context confounder and entity-order confounder. Furthermore, we design Intra- and Inter-entity Deconfounding Data Augmentation methods to eliminate the above confounders according to the theory of backdoor adjustment. Experiments show that our method can improve the performance of the generative NER model in various datasets.

Parallel Instance Query Network for Named Entity Recognition
Yongliang Shen | Xiaobin Wang | Zeqi Tan | Guangwei Xu | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Named entity recognition (NER) is a fundamental task in natural language processing. Recent works treat named entity recognition as a reading comprehension task, constructing type-specific queries manually to extract entities. This paradigm suffers from three issues. First, type-specific queries can only extract one type of entities per inference, which is inefficient. Second, the extraction for different types of entities is isolated, ignoring the dependencies between them. Third, query construction relies on external knowledge and is difficult to apply to realistic scenarios with hundreds of entity types. To deal with them, we propose Parallel Instance Query Network (PIQN), which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner. Each instance query predicts one entity, and by feeding all instance queries simultaneously, we can query all entities in parallel. Instead of being constructed from external knowledge, instance queries can learn their different query semantics during training. For training the model, we treat label assignment as a one-to-many Linear Assignment Problem (LAP) and dynamically assign gold entities to instance queries with minimal assignment cost. Experiments on both nested and flat NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition
Xinyu Wang | Yongliang Shen | Jiong Cai | Tao Wang | Xiaobin Wang | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang | Kewei Tu | Wei Lu | Yong Jiang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The MultiCoNER shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of contexts makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia to provide related context information to the named entity recognition (NER) model. Given an input sentence, our system effectively retrieves related contexts from the knowledge base. The original input sentences are then augmented with such context information, allowing significantly better contextualized token representations to be captured. Our system wins 10 out of 13 tracks in the MultiCoNER shared task.

2021

Heterogeneous Graph Neural Networks for Concept Prerequisite Relation Learning in Educational Data
Chenghao Jia | Yongliang Shen | Yechun Tang | Lu Sun | Weiming Lu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prerequisite relations among concepts are crucial for educational applications, such as curriculum planning and intelligent tutoring. In this paper, we propose a novel concept prerequisite relation learning approach, named CPRL, which combines both concept representation learned from a heterogeneous graph and concept pairwise features. Furthermore, we extend CPRL under weakly supervised settings to make our method more practical, including learning prerequisite relations from learning object dependencies and generating training data with data programming. Our experiments on four datasets show that the proposed approach achieves the state-of-the-art results comparing with existing methods.

Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
Yongliang Shen | Xinyin Ma | Zeqi Tan | Shuai Zhang | Wen Wang | Weiming Lu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

2020

Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma | Yongliang Shen | Gongfan Fang | Chen Chen | Chenghao Jia | Weiming Lu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD), which is designed for compressing large-scale transformer-based models (e.g., BERT). To avoid text generation in discrete space, we introduce a Plug & Play Embedding Guessing method to craft pseudo embeddings from the teacher’s hidden knowledge. Meanwhile, with a self-supervised module to quantify the student’s ability, we adapt the difficulty of pseudo embeddings in an adversarial training manner. To the best of our knowledge, our framework is the first data-free distillation framework designed for NLP tasks. We verify the effectiveness of our method on several text classification datasets.

SynET: Synonym Expansion using Transitivity
Jiale Yu | Yongliang Shen | Xinyin Ma | Chenghao Jia | Chen Chen | Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we study a new task of synonym expansion using transitivity, and propose a novel approach named SynET, which considers both the contexts of two given synonym pairs. It introduces an auxiliary task to reduce the impact of noisy sentences, and proposes a Multi-Perspective Entity Matching Network to match entities from multiple perspectives. Extensive experiments on a real-world dataset show the effectiveness of our approach.

Co-authors

Xiaoxia Cheng 4

Qingpeng Nong 2

Jiangjie Chen 1

Zhenglin Cheng 1

Wen Wang (王雯) 1

Rongsheng Zhang 1

Venues