Emotion Recognition in Conversations (ERC) is a well-studied task with numerous potential real-world applications. However, existing ERC models trained on the MELD dataset derived from TV series, struggle when applied to daily conversation datasets. A closer examination of the datasets unveils the prevalence of linguistic artifacts such as repetitions and interjections in TV scripts, which ERC models may exploit when making predictions. To address this issue, we explore two techniques aimed at reducing the reliance of ERC models on these artifacts: 1) using contrastive learning to prioritize emotional features over dataset-specific linguistic style and 2) refining emotion predictions with pseudo-emotion intensity score. Our experiment results show that reducing reliance on the linguistic style found in TV transcripts could enhance model’s robustness and accuracy in diverse conversational contexts.
The success of ChatGPT has ignited an AI race, with researchers striving to develop new large language models (LLMs) that can match or surpass the language understanding and generation abilities of commercial ones. In recent times, a number of models have emerged, claiming performance near that of GPT-3.5 or GPT-4 through various instruction-tuning methods. As practitioners of Text-to-SQL parsing, we are grateful for their valuable contributions to open-source research. However, it is important to approach these claims with a sense of scrutiny and ascertain the actual effectiveness of these models. Therefore, we pit six popular large language models against each other, systematically evaluating their Text-to-SQL parsing capability on nine benchmark datasets with five different prompting strategies, covering both zero-shot and few-shot scenarios. Regrettably, the open-sourced models fell significantly short of the performance achieved by closed-source models like GPT-3.5, highlighting the need for further work to bridge the performance gap between these models.
Understanding human preferences, along with cultural and social nuances, lives at the heart of natural language understanding. Concretely, we present a new task and corpus for learning alignments between machine and human preferences. Our newly introduced problem is concerned with predicting the preferable options from two sentences describing scenarios that may involve social and cultural situations. Our problem is framed as a natural language inference task with crowd-sourced preference votes by human players, obtained from a gamified voting platform. We benchmark several state-of-the-art neural models, along with BERT and friends on this task. Our experimental results show that current state-of-the-art NLP models still leave much room for improvement.