Zhendong Niu

2025

Outline generation aims to uncover the internal content structure of a document by identifying potential chapter connections and generating corresponding summaries. A robust outline generation model strives for coherence between and within plots. However, existing methods perform well on short- and medium-length texts and struggle with generating readable outlines for very long texts (e.g., fictional literary works). The primary challenge lies in their inability to accurately segment plots within long texts. To address this issue, we propose a novel unsupervised guidance framework, LeStrTP, to guide large language model (LLM) outline generation. This framework ensures that each structured plot encapsulates complete causality by accurately identifying plot boundaries. Specifically, the LeStrTP framework constructs chapter-level graph from long texts and learns their embeddings. Subsequently, through Markov chain modeling chapter dependence, a unique search operator is designed to achieve plot segmentation. To facilitate research on this task, we introduce a new annotated benchmark dataset, NovOutlineSet. Experimental results demonstrate that structured plots not only enhance the coherence and integrity of generated outlines but also significantly improve their quality.

2024

pdf bib abs

A Unified Joint Approach with Topological Context Learning and Rule Augmentation for Knowledge Graph Completion
Jingtao Guo | Chunxia Zhang | Lingxi Li | Xiaojun Xue | Zhendong Niu
Findings of the Association for Computational Linguistics: ACL 2024

Knowledge graph completion (KGC) task is to infer the missing knowledge in the knowledge graph based on known factual triples. However, present KGC approaches still face the following two challenges. Those methods perform simple linear update on relation representation, and only local neighborhood information is aggregated, which makes it difficult to capture logic semantic between relations and global topological context information. To tackle the above challenges, we propose a unified joint approach with Topological Context learning and Rule Augmentation (TCRA) for KGC. The TCRA framework consists of an entity topological context learning mechanism based on dual-branch hierarchical graph attention network, and a relation rule context learning mechanism based on Rule-Transformer and rule-to-relation aggregator. The former mechanism encodes the topological structure features of entities, aggregates the local neighborhood topological context information of entities on the three levels (entity, relation and triple), and build clusters of global head or tail entities related to the same relation. It can capture the local and global topological context information of entities related to the same relation. The latter mechanism introduces chain-like Horn rules as the context information of relations, and encodes the logical semantic of relations to enrich the relation representation. Experimental performances on three benchmark datasets FB15k-237, WN18RR and Kinship indicate the effectiveness and superiority of our proposed approach. The codes are publicly available.

2023

pdf bib abs

Constrained Tuple Extraction with Interaction-Aware Network
Xiaojun Xue | Chunxia Zhang | Tianxiang Xu | Zhendong Niu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tuples extraction is a fundamental task for information extraction and knowledge graph construction. The extracted tuples are usually represented as knowledge triples consisting of subject, relation, and object. In practice, however, the validity of knowledge triples is associated with and changes with the spatial, temporal, or other kinds of constraints. Motivated by this observation, this paper proposes a constrained tuple extraction (CTE) task to guarantee the validity of knowledge tuples. Formally, the CTE task is to extract constrained tuples from unstructured text, which adds constraints to conventional triples. To this end, we propose an interaction-aware network. Combinatorial interactions among context-specific external features and distinct-granularity internal features are exploited to effectively mine the potential constraints. Moreover, we have built a new dataset containing totally 1,748,826 constrained tuples for training and 3656 ones for evaluation. Experiments on our dataset and the public CaRB dataset demonstrate the superiority of the proposed model. The constructed dataset and the codes are publicly available.

2019

pdf bib abs

Understanding narrated instructional videos is important for both research and real-world web applications. Motivated by video dense captioning, we propose a model to generate procedure captions from narrated instructional videos which are a sequence of step-wise clips with description. Previous works on video dense captioning learn video segments and generate captions without considering transcripts. We argue that transcripts in narrated instructional videos can enhance video representation by providing fine-grained complimentary and semantic textual information. In this paper, we introduce a framework to (1) extract procedures by a cross-modality module, which fuses video content with the entire transcript; and (2) generate captions by encoding video frames as well as a snippet of transcripts within each extracted procedure. Experiments show that our model can achieve state-of-the-art performance in procedure extraction and captioning, and the ablation studies demonstrate that both the video frames and the transcripts are important for the task.

Co-authors

Hui He 1

Lei Ji 1

Venues

ACL2
Findings2

Fix author