Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice questions, with an average of 994 tokens per question. Each question contains an average of 4.55 pieces of implicit evidence, and solving the problem typically requires 7.62 logical jumps to find the correct answer. To enhance the performance of LLMs in evidence detection, this paper proposes Detective Reasoning Prompt and Finetune. Experiments demonstrate that the existing LLMs’ abilities to detect evidence in long contexts are far inferior to humans. However, the Detective Reasoning Prompt effectively enhances the capability of powerful LLMs in evidence detection, while the Finetuning method shows significant effects in enhancing the performance of weaker LLMs. Moreover, when the abilities of LLMs in evidence detection are improved, their final reasoning performance is also enhanced accordingly.
Large Language Models (LLMs) have attained the impressive capability to resolve a wide range of NLP tasks by fine-tuning high-quality instruction data. However, collecting human-written data of high quality, especially multi-turn dialogues, is expensive and unattainable for most people. Though previous studies have used powerful LLMs to generate the dialogues automatically, they all suffer from generating untruthful dialogues because of the model hallucination. Therefore, we propose a method called RefGPT to generate enormous truthful and customized dialogues without worrying about factual errors caused by the model hallucination. RefGPT solves the model hallucination in dialogue generation by restricting the LLMs to leverage the given reference instead of reciting their own knowledge to generate dialogues. Additionally, RefGPT adds detailed controls on every utterance to enable high customization capability, which previous studies have ignored. On the basis of RefGPT, we also propose two high-quality dialogue datasets generated by GPT-4, namely **RefGPT-Fact** and **RefGPT-Code**. RefGPT-Fact is a dataset with 100k multi-turn dialogues based on factual knowledge and RefGPT-Code has 76k multi-turn dialogues covering a wide range of coding scenarios. Our code and datasets are released in https://github.com/mutonix/RefGPT.
Prompt-based learning has emerged as a powerful technique in natural language processing (NLP) due to its ability to leverage pre-training knowledge for downstream few-shot tasks. In this paper, we propose 2INER, a novel text-to-text framework for Few-Shot Named Entity Recognition (NER) tasks. Our approach employs instruction finetuning based on InstructionNER to enable the model to effectively comprehend and process task-specific instructions, including both main and auxiliary tasks. We also introduce a new auxiliary task, called Type Extracting, to enhance the model’s understanding of entity types in the overall semantic context of a sentence. To facilitate in-context learning, we concatenate examples to the input, enabling the model to learn from additional contextual information. Experimental results on four datasets demonstrate that our approach outperforms existing Few-Shot NER methods and remains competitive with state-of-the-art standard NER algorithms.
Supervised open relation extraction aims to discover novel relations by leveraging supervised data of pre-defined relations. However, most existing methods do not achieve effective knowledge transfer from pre-defined relations to novel relations, they have difficulties generating high-quality pseudo-labels for unsupervised data of novel relations and usually suffer from the error propagation issue. In this paper, we propose a Cluster-aware Pseudo-Labeling (CaPL) method to improve the pseudo-labels quality and transfer more knowledge for discovering novel relations. Specifically, the model is firstly pre-trained with the pre-defined relations to learn the relation representations. To improve the pseudo-labels quality, the distances between each instance and all cluster centers are used to generate the cluster-aware soft pseudo-labels for novel relations. To mitigate the catastrophic forgetting issue, we design the consistency regularization loss to make better use of the pseudo-labels and jointly train the model with both unsupervised and supervised data. Experimental results on two public datasets demonstrate that our proposed method achieves new state-of-the-arts performance.
Open relation extraction is the task to extract relational facts without pre-defined relation types from open-domain corpora. However, since there are some hard or semi-hard instances sharing similar context and entity information but belonging to different underlying relation, current OpenRE methods always cluster them into the same relation type. In this paper, we propose a novel method based on Instance Ranking and Label Calibration strategies (IRLC) to learn discriminative representations for open relation extraction. Due to lacking the original instance label, we provide three surrogate strategies to generate the positive, hard negative, and semi-hard negative instances for the original instance. Instance ranking aims to refine the relational feature space by pushing the hard and semi-hard negative instances apart from the original instance with different margins and pulling the original instance and its positive instance together. To refine the cluster probability distributions of these instances, we introduce a label calibration strategy to model the constraint relationship between instances. Experimental results on two public datasets demonstrate that our proposed method can significantly outperform the previous state-of-the-art methods.
Zero-shot relation extraction aims to identify novel relations which cannot be observed at the training stage. However, it still faces some challenges since the unseen relations of instances are similar or the input sentences have similar entities, the unseen relation representations from different categories tend to overlap and lead to errors. In this paper, we propose a novel Relation Contrastive Learning framework (RCL) to mitigate above two types of similar problems: Similar Relations and Similar Entities. By jointly optimizing a contrastive instance loss with a relation classification loss on seen relations, RCL can learn subtle difference between instances and achieve better separation between different relation categories in the representation space simultaneously. Especially in contrastive instance learning, the dropout noise as data augmentation is adopted to amplify the semantic difference between similar instances without breaking relation representation, so as to promote model to learn more effective representations. Experiments conducted on two well-known datasets show that RCL can significantly outperform previous state-of-the-art methods. Moreover, if the seen relations are insufficient, RCL can also obtain comparable results with the model trained on the full training set, showing the robustness of our approach.