Meng Liao


2024

pdf bib
Seg2Act: Global Context-aware Action Generation for Document Logical Structuring
Zichao Li | Shaojie He | Meng Liao | Xuanang Chen | Yaojie Lu | Hongyu Lin | Yanxiong Lu | Xianpei Han | Le Sun
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Document logical structuring aims to extract the underlying hierarchical structure of documents, which is crucial for document intelligence. Traditional approaches often fall short in handling the complexity and the variability of lengthy documents. To address these issues, we introduce Seg2Act, an end-to-end, generation-based method for document logical structuring, revisiting logical structure extraction as an action generation task. Specifically, given the text segments of a document, Seg2Act iteratively generates the action sequence via a global context-aware generative model, and simultaneously updates its global context and current logical structure based on the generated actions. Experiments on ChCatExt and HierDoc datasets demonstrate the superior performance of Seg2Act in both supervised and transfer learning settings.

2023

pdf bib
WebDP: Understanding Discourse Structures in Semi-Structured Web Documents
Peilin Liu | Hongyu Lin | Meng Liao | Hao Xiang | Xianpei Han | Le Sun
Findings of the Association for Computational Linguistics: ACL 2023

Web documents have become rich data resources in current era, and understanding their discourse structure will potentially benefit various downstream document processing applications. Unfortunately, current discourse analysis and document intelligence research mostly focus on either discourse structure of plain text or superficial visual structures in document, which cannot accurately describe discourse structure of highly free-styled and semi-structured web documents. To promote discourse studies on web documents, in this paper we introduced a benchmark – WebDP, orienting a new task named Web Document Discourse Parsing. Specifically, a web document discourse structure representation schema is proposed by extending classical discourse theories and adding special features to well represent discourse characteristics of web documents. Then, a manually annotated web document dataset – WEBDOCS is developed to facilitate the study of this parsing task. We compared current neural models on WEBDOCS and experimental results show that WebDP is feasible but also challenging for current models.

2022

pdf bib
ECO v1: Towards Event-Centric Opinion Mining
Ruoxi Xu | Hongyu Lin | Meng Liao | Xianpei Han | Jin Xu | Wei Tan | Yingfei Sun | Le Sun
Findings of the Association for Computational Linguistics: ACL 2022

Events are considered as the fundamental building blocks of the world. Mining event-centric opinions can benefit decision making, people communication, and social good. Unfortunately, there is little literature addressing event-centric opinion mining, although which significantly diverges from the well-studied entity-centric opinion mining in connotation, structure, and expression. In this paper, we propose and formulate the task of event-centric opinion mining based on event-argument structure and expression categorizing theory. We also benchmark this task by constructing a pioneer corpus and designing a two-step benchmark framework. Experiment results show that event-centric opinion mining is feasible and challenging, and the proposed task, dataset, and baselines are beneficial for future studies.

2021

pdf bib
From Discourse to Narrative: Knowledge Projection for Event Relation Extraction
Jialong Tang | Hongyu Lin | Meng Liao | Yaojie Lu | Xianpei Han | Le Sun | Weijian Xie | Jin Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Current event-centric knowledge graphs highly rely on explicit connectives to mine relations between events. Unfortunately, due to the sparsity of connectives, these methods severely undermine the coverage of EventKGs. The lack of high-quality labelled corpora further exacerbates that problem. In this paper, we propose a knowledge projection paradigm for event relation extraction: projecting discourse knowledge to narratives by exploiting the commonalities between them. Specifically, we propose Multi-tier Knowledge Projection Network (MKPNet), which can leverage multi-tier discourse knowledge effectively for event relation extraction. In this way, the labelled data requirement is significantly reduced, and implicit event relations can be effectively extracted. Intrinsic experimental results show that MKPNet achieves the new state-of-the-art performance and extrinsic experimental results verify the value of the extracted event relations.

pdf bib
Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases
Boxi Cao | Hongyu Lin | Xianpei Han | Le Sun | Lingyong Yan | Meng Liao | Tong Xue | Jin Xu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In this paper, we conduct a rigorous study to explore the underlying predicting mechanisms of MLMs over different extraction paradigms. By investigating the behaviors of MLMs, we find that previous decent performance mainly owes to the biased prompts which overfit dataset artifacts. Furthermore, incorporating illustrative cases and external contexts improve knowledge prediction mainly due to entity type guidance and golden answer leakage. Our findings shed light on the underlying predicting mechanisms of MLMs, and strongly question the previous conclusion that current MLMs can potentially serve as reliable factual knowledge bases.

pdf bib
Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction
Yaojie Lu | Hongyu Lin | Jin Xu | Xianpei Han | Jialong Tang | Annan Li | Le Sun | Meng Liao | Shaoyi Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Event extraction is challenging due to the complex structure of event records and the semantic gap between text and event. Traditional methods usually extract event records by decomposing the complex structure prediction task into multiple subtasks. In this paper, we propose Text2Event, a sequence-to-structure generation paradigm that can directly extract events from the text in an end-to-end manner. Specifically, we design a sequence-to-structure network for unified event extraction, a constrained decoding algorithm for event knowledge injection during inference, and a curriculum learning algorithm for efficient model learning. Experimental results show that, by uniformly modeling all tasks in a single model and universally predicting different labels, our method can achieve competitive performance using only record-level annotations in both supervised learning and transfer learning settings.

2012

pdf bib
A Mixed Deterministic Model for Coreference Resolution
Bo Yuan | Qingcai Chen | Yang Xiang | Xiaolong Wang | Liping Ge | Zengjian Liu | Meng Liao | Xianbo Si
Joint Conference on EMNLP and CoNLL - Shared Task