2024
pdf
bib
abs
Self-Para-Consistency: Improving Reasoning Tasks at Low Cost for Large Language Models
Wenqing Chen
|
Weicheng Wang
|
Zhixuan Chu
|
Kui Ren
|
Zibin Zheng
|
Zhichao Lu
Findings of the Association for Computational Linguistics: ACL 2024
Recently, the self-consistency decoding strategy has shown the ability to improve performance for complex reasoning tasks with large language models (LLMs). However, the costs may be high because the sampling process of the strategy generates some low-probability text, resulting in low-quality reasoning paths. As a consequence, it requires a relatively large sampling number to obtain good aggregation performance. In this paper, we propose an alternative strategy, self-para-consistency. It first generates multiple paraphrases for each test question, then generates reasoning paths for the original and all the paraphrased questions based on greedy decoding, and finally selects the most consistent answer. Since all the candidate paths have relatively high probabilities, the sampling number could be much smaller than the self-consistency strategy. Extensive experiments on complex reasoning datasets demonstrate the effectiveness of our method in reducing the sampling number.
2023
pdf
bib
abs
MTR: A Dataset Fusing Inductive, Deductive, and Defeasible Reasoning
Yitian Li
|
Jidong Tian
|
Caoyun Fan
|
Wenqing Chen
|
Hao He
|
Yaohui Jin
Findings of the Association for Computational Linguistics: ACL 2023
A long-standing difficulty in AI is the introduction of human-like reasoning in machine reading comprehension. Since algorithmic models can already perform as well as humans on simple quality assurance tasks thanks to the development of deep learning techniques, more difficult reasoning datasets have been presented. However, these datasets mainly focus on a single type of reasoning. There are still significant gaps in the studies when compared to the complex reasoning used in daily life. In this work, we introduce a brand-new dataset, named MTR. There are two parts to it: the first combines deductive and inductive reasoning, and the second does the same with inductive and defeasible reasoning. It consists of more than 30k QA instances, inferring relations between characters in short stories. Results show that state-of-the-art neural models do noticeably worse than expected. Our empirical results highlight the gap in the models’ ability to handle sophisticated inference.
pdf
bib
abs
Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding
Caoyun Fan
|
Jidong Tian
|
Yitian Li
|
Wenqing Chen
|
Hao He
|
Yaohui Jin
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Chain-of-Thought (CoT) is a technique that guides Large Language Models (LLMs) to decompose complex tasks into multi-step reasoning through intermediate steps in natural language form. Briefly, CoT enables LLMs to think step by step. However, although many Natural Language Understanding (NLU) tasks also require thinking step by step, LLMs perform less well than small-scale Masked Language Models (MLMs). To migrate CoT from LLMs to MLMs, we propose Chain-of-Thought Tuning (CoTT), a two-step reasoning framework based on prompt tuning, to implement step-by-step thinking for MLMs on NLU tasks. From the perspective of CoT, CoTT’s two-step framework enables MLMs to implement task decomposition; CoTT’s prompt tuning allows intermediate steps to be used in natural language form. Thereby, the success of CoT can be extended to NLU tasks through MLMs. To verify the effectiveness of CoTT, we conduct experiments on two NLU tasks: hierarchical classification and relation extraction, and the results show that CoTT outperforms baselines and achieves state-of-the-art performance.
2022
pdf
bib
abs
To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning.
Yitian Li
|
Jidong Tian
|
Wenqing Chen
|
Caoyun Fan
|
Hao He
|
Yaohui Jin
Proceedings of the 29th International Conference on Computational Linguistics
Reasoning and knowledge-related skills are considered as two fundamental skills for natural language understanding (NLU) tasks such as machine reading comprehension (MRC) and natural language inference (NLI). However, it is not clear to what extent an NLU task defined on a dataset correlates to a specific NLU skill. On the one hand, evaluating the correlation requires an understanding of the significance of the NLU skill in a dataset. Significance judges whether a dataset includes sufficient material to help the model master this skill. On the other hand, it is also necessary to evaluate the dependence of the task on the NLU skill. Dependence is a measure of how much the task defined on a dataset depends on the skill. In this paper, we propose a systematic method to diagnose the correlations between an NLU dataset and a specific skill, and then take a fundamental reasoning skill, logical reasoning, as an example for analysis. The method adopts a qualitative indicator to indicate the significance while adopting a quantitative indicator to measure the dependence. We perform diagnosis on 8 MRC datasets (including two types) and 3 NLI datasets and acquire intuitively reasonable results. We then perform the analysis to further understand the results and the proposed indicators. Based on the analysis, although the diagnostic method has some limitations, it is still an effective method to perform a basic diagnosis of the correlation between the dataset and logical reasoning skill, which also can be generalized to other NLU skills.
2021
pdf
bib
abs
De-Confounded Variational Encoder-Decoder for Logical Table-to-Text Generation
Wenqing Chen
|
Jidong Tian
|
Yitian Li
|
Hao He
|
Yaohui Jin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Logical table-to-text generation aims to automatically generate fluent and logically faithful text from tables. The task remains challenging where deep learning models often generated linguistically fluent but logically inconsistent text. The underlying reason may be that deep learning models often capture surface-level spurious correlations rather than the causal relationships between the table x and the sentence y. Specifically, in the training stage, a model can get a low empirical loss without understanding x and use spurious statistical cues instead. In this paper, we propose a de-confounded variational encoder-decoder (DCVED) based on causal intervention, learning the objective p(y|do(x)). Firstly, we propose to use variational inference to estimate the confounders in the latent space and cooperate with the causal intervention based on Pearl’s do-calculus to alleviate the spurious correlations. Secondly, to make the latent confounder meaningful, we propose a back-prediction process to predict the not-used entities but linguistically similar to the exactly selected ones. Finally, since our variational model can generate multiple candidates, we train a table-text selector to find out the best candidate sentence for the given table. An extensive set of experiments show that our model outperforms the baselines and achieves new state-of-the-art performance on two logical table-to-text datasets in terms of logical fidelity.
pdf
bib
abs
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI
Jidong Tian
|
Yitian Li
|
Wenqing Chen
|
Liqiang Xiao
|
Hao He
|
Yaohui Jin
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Recently, language models (LMs) have achieved significant performance on many NLU tasks, which has spurred widespread interest for their possible applications in the scientific and social area. However, LMs have faced much criticism of whether they are truly capable of reasoning in NLU. In this work, we propose a diagnostic method for first-order logic (FOL) reasoning with a new proposed benchmark, LogicNLI. LogicNLI is an NLI-style dataset that effectively disentangles the target FOL reasoning from commonsense inference and can be used to diagnose LMs from four perspectives: accuracy, robustness, generalization, and interpretability. Experiments on BERT, RoBERTa, and XLNet, have uncovered the weaknesses of these LMs on FOL reasoning, which motivates future exploration to enhance the reasoning ability.
2020
pdf
bib
abs
A Semantically Consistent and Syntactically Variational Encoder-Decoder Framework for Paraphrase Generation
Wenqing Chen
|
Jidong Tian
|
Liqiang Xiao
|
Hao He
|
Yaohui Jin
Proceedings of the 28th International Conference on Computational Linguistics
Paraphrase generation aims to generate semantically consistent sentences with different syntactic realizations. Most of the recent studies rely on the typical encoder-decoder framework where the generation process is deterministic. However, in practice, the ability to generate multiple syntactically different paraphrases is important. Recent work proposed to cooperate variational inference on a target-related latent variable to introduce the diversity. But the latent variable may be contaminated by the semantic information of other unrelated sentences, and in turn, change the conveyed meaning of generated paraphrases. In this paper, we propose a semantically consistent and syntactically variational encoder-decoder framework, which uses adversarial learning to ensure the syntactic latent variable be semantic-free. Moreover, we adopt another discriminator to improve the word-level and sentence-level semantic consistency. So the proposed framework can generate multiple semantically consistent and syntactically different paraphrases. The experiments show that our model outperforms the baseline models on the metrics based on both n-gram matching and semantic similarity, and our model can generate multiple different paraphrases by assembling different syntactic variables.
pdf
bib
abs
Exploring Logically Dependent Multi-task Learning with Causal Inference
Wenqing Chen
|
Jidong Tian
|
Liqiang Xiao
|
Hao He
|
Yaohui Jin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Previous studies have shown that hierarchical multi-task learning (MTL) can utilize task dependencies by stacking encoders and outperform democratic MTL. However, stacking encoders only considers the dependencies of feature representations and ignores the label dependencies in logically dependent tasks. Furthermore, how to properly utilize the labels remains an issue due to the cascading errors between tasks. In this paper, we view logically dependent MTL from the perspective of causal inference and suggest a mediation assumption instead of the confounding assumption in conventional MTL models. We propose a model including two key mechanisms: label transfer (LT) for each task to utilize the labels of all its lower-level tasks, and Gumbel sampling (GS) to deal with cascading errors. In the field of causal inference, GS in our model is essentially a counterfactual reasoning process, trying to estimate the causal effect between tasks and utilize it to improve MTL. We conduct experiments on two English datasets and one Chinese dataset. Experiment results show that our model achieves state-of-the-art on six out of seven subtasks and improves predictions’ consistency.
2018
pdf
bib
abs
Gated Multi-Task Network for Text Classification
Liqiang Xiao
|
Honglun Zhang
|
Wenqing Chen
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Multi-task learning with Convolutional Neural Network (CNN) has shown great success in many Natural Language Processing (NLP) tasks. This success can be largely attributed to the feature sharing by fusing some layers among tasks. However, most existing approaches just fully or proportionally share the features without distinguishing the helpfulness of them. By that the network would be confused by the helpless even harmful features, generating undesired interference between tasks. In this paper, we introduce gate mechanism into multi-task CNN and propose a new Gated Sharing Unit, which can filter the feature flows between tasks and greatly reduce the interference. Experiments on 9 text classification datasets shows that our approach can learn selection rules automatically and gain a great improvement over strong baselines.
pdf
bib
abs
Learning What to Share: Leaky Multi-Task Network for Text Classification
Liqiang Xiao
|
Honglun Zhang
|
Wenqing Chen
|
Yongkun Wang
|
Yaohui Jin
Proceedings of the 27th International Conference on Computational Linguistics
Neural network based multi-task learning has achieved great success on many NLP problems, which focuses on sharing knowledge among tasks by linking some layers to enhance the performance. However, most existing approaches suffer from the interference between tasks because they lack of selection mechanism for feature sharing. In this way, the feature spaces of tasks may be easily contaminated by helpless features borrowed from others, which will confuse the models for making correct prediction. In this paper, we propose a multi-task convolutional neural network with the Leaky Unit, which has memory and forgetting mechanism to filter the feature flows between tasks. Experiments on five different datasets for text classification validate the benefits of our approach.
pdf
bib
abs
Multi-Task Label Embedding for Text Classification
Honglun Zhang
|
Liqiang Xiao
|
Wenqing Chen
|
Yongkun Wang
|
Yaohui Jin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, a large body of previous work treats labels of each task as independent and meaningless one-hot vectors, which cause a loss of potential label information. In this paper, we propose Multi-Task Label Embedding to convert labels in text classification into semantic vectors, thereby turning the original tasks into vector matching tasks. Our model utilizes semantic correlations among tasks and makes it convenient to scale or transfer when new tasks are involved. Extensive experiments on five benchmark datasets for text classification show that our model can effectively improve the performances of related tasks with semantic representations of labels and additional information from each other.
pdf
bib
abs
MCapsNet: Capsule Network for Text with Multi-Task Learning
Liqiang Xiao
|
Honglun Zhang
|
Wenqing Chen
|
Yongkun Wang
|
Yaohui Jin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Multi-task learning has an ability to share the knowledge among related tasks and implicitly increase the training data. However, it has long been frustrated by the interference among tasks. This paper investigates the performance of capsule network for text, and proposes a capsule-based multi-task learning architecture, which is unified, simple and effective. With the advantages of capsules for feature clustering, proposed task routing algorithm can cluster the features for each task in the network, which helps reduce the interference among tasks. Experiments on six text classification datasets demonstrate the effectiveness of our models and their characteristics for feature clustering.