Shisong Chen

2025

Recently, Multi-modal Entity Linking (MEL) has attracted increasing attention in the research community due to its significance in numerous multi-modal applications. Video, as a popular means of information transmission, has become prevalent in people’s daily lives. However, most existing MEL methods primarily focus on linking textual and visual mentions or offline videos’ mentions to entities in multi-modal knowledge bases, with limited efforts devoted to linking mentions within online video content. In this paper, we propose a task called Online Video Entity Linking (OVEL), aiming to establish connections between mentions in online videos and a knowledge base with high accuracy and timeliness. To facilitate the research works of (OVEL), we specifically concentrate on live delivery scenarios and construct a live delivery entity linking dataset called (LIVE). Besides, we propose an evaluation metric that considers robustness, timelessness, and accuracy. Furthermore, to effectively handle (OVEL) task, we leverage a memory block managed by a Large Language Model and retrieve entity candidates from the knowledge base to augment LLM performance on memory management. The experimental results prove the effectiveness and efficiency of our method.

2024

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (e.g., ChatGPT) and ESC-oriented LLMs (e.g., ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4.

This paper presents a novel solution to tackle the challenges that posed by the abundance of non-standard addresses, which input by users in modern applications such as navigation maps, ride-hailing apps, food delivery platforms, and logistics services. These manually entered addresses often contain irregularities, such as missing information, spelling errors, colloquial descriptions, and directional offsets, which hinder address-related tasks like address matching and linking. To tackle these challenges, we propose GeoAgent, a new framework comprising two main components: a large language model (LLM) and a suite of geographical tools. By harnessing the semantic understanding capabilities of the LLM and integrating specific geospatial tools, GeoAgent incorporates spatial knowledge into address texts and achieves efficient address standardization. Further, to verify the effectiveness and practicality of our approach, we construct a comprehensive dataset of complex non-standard addresses, which fills the gaps in existing datasets and proves invaluable for training and evaluating the performance of address standardization models in this community. Experimental results demonstrate the efficacy of GeoAgent, showcasing substantial improvements in the performance of address-related models across various downstream tasks.

2021

Zero pronoun resolution aims at recognizing dropped pronouns and pointing out their anaphoric mentions, while non-zero coreference resolution targets at clustering mentions referring to the same entity. Existing efforts often deal with the two problems separately regardless of their close essential correlations. In this paper, we investigate the possibility of jointly solving zero pronoun resolution and coreference resolution via a novel end-to-end neural model. Specifically, we design a gap-masked self-attention model that encodes gaps and tokens in the same space, where gaps could capture valuable contextual information according to their surrounding tokens while tokens could maintain original sequential information without disturbance. Additionally, we also propose a two-stage interaction mechanism to make full use of the exclusive relationship between zero pronouns and mentions. Our empirical study conducted on the OntoNotes 5.0 Chinese dataset shows that our model could outperform corresponding state-of-the-art approaches on both tasks.