Jiaao Yu


2026

Length-controllable text generation (LCTG) is essential for tasks like text summarization and report generation. However, large language models (LLMs) have limited awareness of output length, so precise control over the length of generated text remains a significant challenge. Most existing methods focus on prompt-based frameworks, position encoding, and reinforcement learning for model training. These approaches may affect semantic quality, and struggle to maintain consistent length control across different models and tasks. In this paper, we propose DcLM, a model-agnostic approach that introduces dynamic length markers to guide length-controllable outputs. During training, the model leverages these markers as in-context information, without learning to generate them. At inference time, an external word counter and injected length information guide the model to produce outputs of accurate lengths. We evaluate our method across multiple datasets, and the experimental results demonstrate that DcLM significantly reduces length deviation, showcasing its robust generalization ability across various length scales and tasks.

2025

Multimodal event extraction task aims to identify event types and arguments from visual and textual representations related to events. Due to the high cost of multimedia training data, previous methods mainly focused on weakly alignment of excellent unimodal encoders. However, they ignore the conflict between event understanding and image recognition, resulting in redundant feature perception affecting the understanding of multimodal events. In this paper, we propose a multimodal event extraction strategy with a multi-level redundant feature selection mechanism, which enhances the event understanding ability of multimodal large language models by leveraging knowledge editing techniques, and requires no additional parameter optimization work. Extensive experiments show that our method outperforms the state-of-the-art (SOTA) baselines on the M2E2 benchmark. Compared with the highest baseline, we achieve a 34% improvement of precision on event extraction and a 11% improvement of F1 on argument extraction.