Yuchen Song

Also published as: 昱辰


2025

Visual information has been introduced for enhancing machine translation (MT), and its effectiveness heavily relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations. In this paper, we introduce a stable diffusion-based imagination network into a multimodal large language model (MLLM) to explicitly generate an image for each source sentence, thereby advancing the multimodel MT. Particularly, we build heuristic feedback with reinforcement learning to ensure the consistency of the generated image with the source sentence without the supervision of visual information, which breaks the high-cost bottleneck of image annotation in MT. Furthermore, the proposed method enables imaginative visual information to be integrated into text-only MT in addition to multimodal MT. Experimental results show that our model significantly outperforms existing multimodal MT and text-only MT, especially achieving an average improvement of more than 14 BLEU points on Multi30K and MSCOCO multimodal MT benchmarks.

2024

“句式结构是一种基于句本位语法的形式化句法结构,采用自定义的图解形式呈现句子结构。本文提出了涵盖小句结构、词法结构和句间结构三方面的句式结构体系,阐明了其设计理念以及句本位的析句原则,最后概述了基于该体系构建汉语树库的工程进展情况。”