Zero-Shot Dialogue Relation Extraction by Relating Explainable Triggers and Relation Names

Developing dialogue relation extraction (DRE) systems often requires a large amount of labeled data, which can be costly and time-consuming to annotate. In order to improve scalability and support diverse, unseen relation extraction, this paper proposes a method for leveraging the ability to capture triggers and relate them to previously unseen relation names. Specifically, we introduce a model that enables zero-shot dialogue relation extraction by utilizing trigger-capturing capabilities. Our experiments on a benchmark DialogRE dataset demonstrate that the proposed model achieves significant improvements for both seen and unseen relations. Notably, this is the first attempt at zero-shot dialogue relation extraction using trigger-capturing capabilities, and our results suggest that this approach is effective for inferring previously unseen relation types. Overall, our findings highlight the potential for this method to enhance the scalability and practicality of DRE systems.


Introduction
Relation extraction (RE) is a key natural language processing (NLP) task that identifies the semantic relationships between arguments in various types of text data.It involves extracting relevant information and representing it in a structured form for downstream applications (Zhang et al., 2017;Cohen et al., 2020;Zhou and Chen, 2021;Huguet Cabot and Navigli, 2021).Dialogue relation extraction (DRE) is a specialized area of RE that focuses on identifying semantic relationships between arguments in conversations.Recent DRE research has used diverse methods to improve relation extraction performance, including constructing dialogue graphs (Lee and Choi, 2021), identifying explicit triggers (Albalak et al., 2022;Lin et al., 2022), and using prompt-based fine-tuning approaches (Son et al., 2022).
Supervised training for RE tasks can be timeconsuming and expensive due to the requirement for a large amount of labeled data.Models trained on limited data can only predict the relations they have been trained on, making it challenging to identify similar but unseen relations.Hence, recent research has explored methods that require only a few labeled examples or no labeled examples at all, such as prompt-based fine-tuning (Schick and Schütze, 2020;Puri and Catanzaro, 2019).Additionally, Sainz et al. (2021) improved zero-shot performance by transforming the RE task into an entailment task.However, this approach has not yet been applied to DRE due to the challenge of converting long conversations into NLI format.
In this work, we observe that different relations may be dependent on each other, such as the parentchild relationship listed in Table 1.Prior work has treated all relations independently and modeled different labels in a multi-class scenario, making it impossible for models to handle unseen relations even if they are relevant to previously seen relations.Therefore, this paper focuses on enabling zero-shot relation prediction.Specifically, if we encounter an unseen relation during testing but have previously seen a similar relation, we can relate them through explicitly mentioned trigger words, such as per:children (seen relation) → "mom" (trigger) → per:parents (unseen relation).
To achieve this, we need to identify the key information of the relation as a tool for relation reasoning during inference.We adopt the approach proposed in Lin et al. (2022), which achieves remarkable results in DRE by capturing explainable keywords in a dialogue for guiding relation extraction.By leveraging such trigger-capturing capabilities, our proposed model can better deduce unseen relations from known relations and associated triggers.Therefore, the proposed DRE model is more practical, as it can generalize to unseen relations.

Proposed Approach
Prior work on classical DRE has treated it as a multi-class classification problem, which makes it challenging to scale to unseen relation scenarios.
To enable a zero-shot setting, we reformulate the multi-class classification task into multiple binary classification tasks by adding each relation name as input, as illustrated in Figure 1.The binary classification task predicts whether the subject and object in the dialogue belong to the given relation.This approach is equivalent to predicting whether a set of subject-object relations is established, which can estimate any relations based only on their names (or natural language descriptions).

Model Architecture
Our model is illustrated in Figure 2, where there are three components in our architecture.
Trigger Prediction Inspired by Lin et al. (2022), we incorporate a trigger predictor into our model, allowing us to employ explicit cues for identify-ing subject-object relationships within a dialogue.Specifically, we adapt techniques from questionanswering models to predict the start and end positions of the trigger span.By detecting these triggers, our model not only reasons the potential unseen relations but also enhances the interpretability of the task, making it more practical for realworld applications.To identify the keywords associated with (Subject, Object, RelationType) in a dialogue, we formulate the task as an extractive question-answering problem (Rajpurkar et al., 2016).In this setting, the dialogue can be viewed as a document, where the subject-object pair represents the question, and the answer corresponds to the span of keywords that explain the associated relation, i.e., the triggers.
Relation Name Injection In contrast to most prior work (Lee and Choi, 2021;Lin et al., 2022;Albalak et al., 2022) expressed in the dialogue.

Training
As depicted in Figure 2, the input (Dialogue, Subject, Oubject, RelationType) will be initially expanded into a sequence resembling BERT's input format.The model is trained to perform two tasks.Firstly, it learns the ability to find the trigger span, and secondly, it learns to incorporate the triggers into the relation prediction.
Negative Sampling In accordance with Mikolov et al. (2013), we have adopted the negative sampling method in our training process.Specifically, we randomly select some relations from the set of previously observed relations that do not correspond to the given subject-object pair to create negative samples.Notably, the trigger spans of these negative samples remain unchanged.

Multi-Task Learning
The trigger prediction task involves identifying the most likely trigger positions, and is treated as a single-label classification problem using cross-entropy loss L T rigger .On the other hand, the relation prediction task employs binary cross-entropy loss L Binary to compute the prediction loss.To train the model simultaneously on both tasks, we employ multi-task learning.We use a linear combination of the two losses as the objective function.This enables us to train the entire model in an end-to-end fashion.

Inference
During inference, our model follows a similar setting to the one used during training.However, we have observed that the model tends to predict the seen relation when the captured trigger words are present in the training data.To prevent the model from overfitting to the seen relations, we replace the trigger span with a general embedding (the embedding of [CLS]), which is assumed to carry the information of the entire sentence.This embedding is used as the input for our relation prediction.By doing so, our model can better generalize to unseen scenarios and can avoid the tendency to predict the seen relation when capturing seen trigger words.This approach enhances the model's ability to handle diverse unseen relations during inference.

Experiments
We conducted experiments using the DialogRE dataset, which is widely used as a benchmark in the field.To assess our model's zero-shot capability, we divided the total of 36 relations into 20 seen and 16 unseen types detailed in the Appendix.We only train our model on data related to seen relation types.During training, we set the learning rate to 3e-5 and used a GeForce RTX 2080 Ti.The training process involves 10 epochs without early stopping2 , and the number of negative samples was 3. To ensure a fair comparison with prior work (Lin et al., 2022;Yu et al., 2020), we use the same testing set for evaluation.

Model Setting
We perform different model settings on BERT-Base for fair comparison.
• Multi-class BERT is a baseline, where BERT-Base (Devlin et al., 2019) is adopted and treated DRE as multi-class classification.• TUCORE-GCN construct a dialogue graph to utilize the graph strucutre for prediction (Lee and Choi, 2021).• TREND proposed to capture explicit triggers for better performance (Lin et al., 2022). 3• Binary-reformulated BERT performs binary classification shown in Figure 1, which is a proper baseline for zero-shot settings.• Proposed has three settings in binary relation prediction during inference: 1) based on predicted triggers, 2) based on relation name embddings, 3) based on gold triggers.The third is listed as an upper bound for reference. 4 3 The scores are reported from the prior work for reference, which cannot be directly compared with our scores. 4Overall performance is estimated based on data size.

Results
Table 2 presents our results.Prior work achieves micro-F scores above 60% for seen relations but cannot predict unseen relations (0%) due to their multi-class formulation.The reformulated BERT serves as the baseline for zero-shot settings, achieving 24.9% and 28.9% for top 1 and top 2 ranked relations, respectively.Our proposed method of inputting predicted triggers for relation prediction did not rank correct unseen relations as top 1 (23.5% vs. 24.5%).However, the performance of top 2 ranked relations significantly improved (from 28.9% to 34.8%), suggesting that trigger prediction is indeed useful.The lower top 1 relations score can be attributed to similar triggers for relevant relations, which easily favor seen relations.An example of incorrect prediction is provided in Table 3.
Replacing predicted triggers with relation name embeddings, our proposed model achieves the best performance for unseen relations (32.5% for top 1 and 34.8% for top 2).This indicates that this setting avoids overfitting to seen relations and allows prediction to better generalize to unseen scenarios.
Moreover, feeding gold triggers into relation extraction during inference yields the best results, indicating the potential for improvement with the proposed trigger mechanism.In sum, the experiments demonstrate that our proposed model can connect trigger words with relation names and enables zero-shot relation extraction.
In terms of performance on seen data, our proposed models outperform the reformulated BERT baseline by a significant margin.Moreover, our models achieve comparable scores to previous work (66.7% vs. 66.8% in top 1 scores), even though we consider more candidates.These results further validate the effectiveness of our model and its superior generalization capability.After comprehensive analysis, we found that our proposed method incorporating a general context embedding not only leverages the trigger capturing capability but also assists the DRE task indirectly, leading to the best overall performance among all proposed models.The ability to relate trigger keywords to relation names enables the model to generalize better to unseen relations and overcome the limitations of relying on specific trigger words.The results of our experiments demonstrate the effectiveness of our proposed method and its potential for real-world applications.

Qualitative Study
Table 3 showcases an example about the predicted triggers and relations for the DialogRE dataset.As an instance, Sal is the uncle of Speaker 3, so the relation between them should be "other_family".Although the trigger word mechanism accurately captures the crucial keyword "uncle", the model still outputs the "children" relation from the seen relation category rather than the "other_family" relation from the unseen relation category.This suggests that while capturing significant subject and object information through trigger words, the model tends to prioritize predicting relations from the seen relation category.

Conclusion
This paper introduces a novel approach for zero-shot dialogue relation extraction by relating explainable trigger words and relation names.Our proposed method effectively utilizes trigger-capturing capability and demonstrates a significant improvement in inferring unseen relations.The experimental results on benchmark data show that our approach achieves better generalization and practicality, making it a promising solution for real-world applications.

A Criteria for Relation Dividing
We categorized the relations into two sets, namely, seen and unseen, as presented in Table 4.Our categorization was based on the similarity of relations, where dependent ones are assigned to different categories.For those not related, we assigned them randomly to either category.This categorization aims to train the model on seen relations to enhance its ability to predict unseen relations during testing.

B Prediction Distribution Comparison
We analyze the distribution of correctly predicted top 1 unseen relations for two models, one with predicted triggers and the other with relation name embeddings, and present the results in Table 5.We   observe that the two methods exhibit a similar pattern of correctly predicted relations, with a concentration on particular unseen relations such as siblings and spouses, among others.However, the proposed method with the relation name embeddings significantly outperforms the one with the predicted triggers method in this aspect.

Figure 1 :
Figure 1: The illustration of our proposed zero-shot relation extraction model.

Figure 2 :
Figure 2: The illustration of our proposed model architecture.

Table 1 :
Similar relation examples in DialogRE.

Table 2 :
After performing multiple binary classification tasks, our model can rank the relation candidates The micro-F1 performance of DialogRE in terms of unseen, seen, and overall settings (%).
This allows us to gain insight into how well our model can rank the correct relations, even if they are not the top-ranked ones.

Table 4 :
Seen and unseen relations in our experiments.

Table 5 :
The distribution of correct predictions in the predict trigger method and cls trigger method.