2025
pdf
bib
abs
Incremental Transformer: Efficient Encoder for Incremented Text Over MRC and Conversation Tasks
Weisheng Li
|
Yuechen Wang
|
Jiaxin Shi
|
Wengang Zhou
|
Qi Tian
|
Houqiang Li
Proceedings of the 31st International Conference on Computational Linguistics
Some encoder inputs such as conversation histories are frequently extended with short additional inputs like new responses. However, to obtain the real-time encoding of the extended input, existing Transformer-based encoders like BERT have to encode the whole extended input again without utilizing the existing encoding of the original input, which may be prohibitively slow for real-time applications. In this paper, we introduce Incremental Transformer, an efficient encoder dedicated for faster encoding of incremented input. It takes only added input as input but attends to cached representations of original input in lower layers for better performance. By treating questions as additional inputs of a passage, Incremental Transformer can also be applied to accelerate MRC tasks. Experimental results show tiny decline in effectiveness but significant speedup against traditional full encoder across various MRC and multi-turn conversational question answering tasks. With the help from simple distillation-like auxiliary losses, Incremental Transformer achieves a speedup of 6.2x, with a mere 2.2 point accuracy reduction in comparison to RoBERTa-Large on SQuADV1.1.
2024
pdf
bib
abs
基于思维链的跨语言多文档摘要生成技术研究(Cross-lingual Multi-document Summarization Based on Chain-of-Thought)
Qi Tian (祁天)
|
Yang Jianan (杨建安)
|
Zhao Tiejun (赵铁军)
|
Yang Muyun (杨沐昀)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“随着全球化的加速发展,跨语言信息的高效传递与理解变得尤为重要。传统的多文档摘要生成技术可以提升信息获取效率,然而往往忽视了跨语言场景下的特殊挑战。为了缓解这一问题,本文提出了跨语言多文档摘要生成任务。我们首先构建了一个全面的跨语言多文档摘要测试集作为评估基准,其次提出了一种基于思维链技术的跨语言多文档摘要生成方法,并对其进行了实验验证。在实验中,我们使用了几种典型的大语言模型,并通过人工评估和自动评估来验证我们的方法。结果表明,我们提出的基于思维链的方法在跨语言多文档摘要生成任务上取得了显著的性能提升,为解决语言障碍下的信息获取问题提供了有效的解决方案。”
2023
pdf
bib
abs
Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering
Jiajie Zhang
|
Shulin Cao
|
Tingjian Zhang
|
Xin Lv
|
Juanzi Li
|
Lei Hou
|
Jiaxin Shi
|
Qi Tian
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Explainable question answering (XQA) aims to answer a given question and provide an explanation why the answer is selected. Existing XQA methods focus on reasoning on a single knowledge source, e.g., structured knowledge bases, unstructured corpora, etc. However, integrating information from heterogeneous knowledge sources is essential to answer complex questions. In this paper, we propose to leverage question decomposing for heterogeneous knowledge integration, by breaking down a complex question into simpler ones, and selecting the appropriate knowledge source for each sub-question. To facilitate reasoning, we propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT). First, we build the Hierarchical Question Decomposition Tree (HQDT) to understand the semantics of a complex question; then, we conduct probabilistic reasoning over HQDT from root to leaves recursively, to aggregate heterogeneous knowledge at different tree levels and search for a best solution considering the decomposing and answering probabilities. The experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly, demonstrating the effectiveness of leveraging question decomposing for knowledge integration and our RoHT framework.
pdf
bib
abs
Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions
Shulin Cao
|
Jiajie Zhang
|
Jiaxin Shi
|
Xin Lv
|
Zijun Yao
|
Qi Tian
|
Lei Hou
|
Juanzi Li
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models’ parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.
2022
pdf
bib
abs
GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation
Lunyiu Nie
|
Shulin Cao
|
Jiaxin Shi
|
Jiuding Sun
|
Qi Tian
|
Lei Hou
|
Juanzi Li
|
Jidong Zhai
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Subject to the huge semantic gap between natural and formal languages, neural semantic parsing is typically bottlenecked by its complexity of dealing with both input semantics and output syntax. Recent works have proposed several forms of supplementary supervision but none is generalized across multiple formal languages. This paper proposes a unified intermediate representation for graph query languages, named GraphQ IR. It has a natural-language-like expression that bridges the semantic gap and formally defined syntax that maintains the graph structure. Therefore, a neural semantic parser can more precisely convert user queries into GraphQ IR, which can be later losslessly compiled into various downstream graph query languages. Extensive experiments on several benchmarks including KQA Pro, Overnight, GrailQA, and MetaQA-Cypher under the standard i.i.d., out-of-distribution, and low-resource settings validate GraphQ IR’s superiority over the previous state-of-the-arts with a maximum 11% accuracy improvement.
pdf
bib
abs
ParaMac: A General Unsupervised Paraphrase Generation Framework Leveraging Semantic Constraints and Diversifying Mechanisms
Jinxin Liu
|
Jiaxin Shi
|
Ji Qi
|
Lei Hou
|
Juanzi Li
|
Qi Tian
Findings of the Association for Computational Linguistics: EMNLP 2022
Paraphrase generation reflects the ability to understand the meaning from the language surface form and rephrase it to other expressions. Recent paraphrase generation works have paid attention to unsupervised approaches based on Pre-trained Language Models (PLMs) to avoid heavy reliance on parallel data by utilizing PLMs’ generation ability. However, the generated pairs of existing unsupervised methods are usually weak either in semantic equivalence or expression diversity. In this paper, we present a novel unsupervised paraphrase generation framework called Paraphrase Machine. By employing multi-aspect equivalence constraints and multi-granularity diversifying mechanisms, Paraphrase Machine is able to achieve good semantic equivalence and expressive diversity, producing a high-quality unsupervised paraphrase dataset. Based on this dataset, we train a general paraphrase model, which can be directly applied to rewrite the input sentence of various domains without any fine-tuning, and achieves substantial gains of 9.1% and 3.3% absolutely in BLEU score over previous SOTA on Quora and MSCOCO. By further fine-tuning our model with domain-specific training sets, the improvement can be increased to even 18.0% and 4.6%. Most importantly, by applying it to language understanding and generation tasks under the low-resource setting, we demonstrate that our model can serve as a universal data augmentor to boost the few-shot performance (e.g., average 2.0% gain on GLUE).