We propose a simple yet effective method for enhancing persona consistency in dialogue response generation using Direct Preference Optimization (DPO). In our method, we generate responses from the response generation model using persona information that has been randomly swapped with data from other dialogues, treating these responses as pseudo-negative samples. The reference responses serve as positive samples, allowing us to create pseudo-preference data. Experimental results demonstrate that our model, fine-tuned with DPO on the pseudo preference data, produces more consistent and natural responses compared to models trained using supervised fine-tuning or reinforcement learning approaches based on entailment relations between personas and utterances.
Few-shot text classification aims to classify inputs whose label has only a few examples. Previous studies overlooked the semantic relevance between label representations. Therefore, they are easily confused by labels that are relevant. To address this problem, we propose a method that generates distinct label representations that embed information specific to each label. Our method is applicable to conventional few-shot classification models. Experimental results show that our method significantly improved the performance of few-shot text classification across models and datasets.
We create a large-scale dialogue corpus that provides pragmatic paraphrases to advance technology for understanding the underlying intentions of users. While neural conversation models acquire the ability to generate fluent responses through training on a dialogue corpus, previous corpora have mainly focused on the literal meanings of utterances. However, in reality, people do not always present their intentions directly. For example, if a person said to the operator of a reservation service “I don’t have enough budget.”, they, in fact, mean “please find a cheaper option for me.” Our corpus provides a total of 71,498 indirect–direct utterance pairs accompanied by a multi-turn dialogue history extracted from the MultiWoZ dataset. In addition, we propose three tasks to benchmark the ability of models to recognize and generate indirect and direct utterances. We also investigated the performance of state-of-the-art pre-trained models as baselines.
Advanced pre-trained models for text representation have achieved state-of-the-art performance on various text classification tasks. However, the discrepancy between the semantic similarity of texts and labelling standards affects classifiers, i.e. leading to lower performance in cases where classifiers should assign different labels to semantically similar texts. To address this problem, we propose a simple multitask learning model that uses negative supervision. Specifically, our model encourages texts with different labels to have distinct representations. Comprehensive experiments show that our model outperforms the state-of-the-art pre-trained model on both single- and multi-label classifications, sentence and document classifications, and classifications in three different languages.
We propose a method to control the specificity of responses while maintaining the consistency with the utterances. We first design a metric based on pointwise mutual information, which measures the co-occurrence degree between an utterance and a response. To control the specificity of generated responses, we add the distant supervision based on the co-occurrence degree and a PMI-based word prediction mechanism to a sequence-to-sequence model. With these mechanisms, our model outputs the words with optimal specificity for a given specificity control variable. In experiments with open-domain dialogue corpora, automatic and human evaluation results confirm that our model controls the specificity of the response more sensitively than the conventional model and can generate highly consistent responses.
Sequence-to-sequence models are a common approach to develop a chatbot. They can train a conversational model in an end-to-end manner. One significant drawback of such a neural network based approach is that the response generation process is a black-box, and how a specific response is generated is unclear. To tackle this problem, an interpretable response generation mechanism is desired. As a step toward this direction, we focus on dialogue-acts (DAs) that may provide insight to understand the response generation process. In particular, we propose a method to predict a DA of the next response based on the history of previous utterances and their DAs. Experiments using a Switch Board Dialogue Act corpus show that compared to the baseline considering only a single utterance, our model achieves 10.8% higher F1-score and 3.0% higher accuracy on DA prediction.
A sequence-to-sequence model tends to generate generic responses with little information for input utterances. To solve this problem, we propose a neural model that generates relevant and informative responses. Our model has simple architecture to enable easy application to existing neural dialogue models. Specifically, using positive pointwise mutual information, it first identifies keywords that frequently co-occur in responses given an utterance. Then, the model encourages the decoder to use the keywords for response generation. Experiment results demonstrate that our model successfully diversifies responses relative to previous models.
A neural conversation model is a promising approach to develop dialogue systems with the ability of chit-chat. It allows training a model in an end-to-end manner without complex rule design nor feature engineering. However, as a side effect, the neural model tends to generate safe but uninformative and insensitive responses like “OK” and “I don’t know.” Such replies are called generic responses and regarded as a critical problem for user-engagement of dialogue systems. For a more engaging chit-chat experience, we propose a neural conversation model that generates responsive and self-expressive replies. Specifically, our model generates domain-aware and sentiment-rich responses. Experiments empirically confirmed that our model outperformed the sequence-to-sequence model; 68.1% of our responses were domain-aware with sentiment polarities, which was only 2.7% for responses generated by the sequence-to-sequence model.