Manling Li


pdf bib
COVID-19 Claim Radar: A Structured Claim Extraction and Tracking System
Manling Li | Revanth Gangi Reddy | Ziqi Wang | Yi-shyuan Chiang | Tuan Lai | Pengfei Yu | Zixuan Zhang | Heng Ji
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

To tackle the challenge of accurate and timely communication regarding the COVID-19 pandemic, we present a COVID-19 Claim Radar to automatically extract supporting and refuting claims on a daily basis. We provide a comprehensive structured view of claims, including rich claim attributes (such as claimers and claimer affiliations) and associated knowledge elements as claim semantics (such as events, relations and entities), enabling users to explore equivalent, refuting, or supporting claims with structural evidence, such as shared claimers, similar centroid events and arguments. In order to consolidate claim structures at the corpus-level, we leverage Wikidata as the hub to merge coreferential knowledge elements. The system automatically provides users a comprehensive exposure to COVID-19 related claims, their importance, and their interconnections. The system is publicly available at GitHub and DockerHub, with complete documentation.

pdf bib
Event Schema Induction with Double Graph Autoencoders
Xiaomeng Jin | Manling Li | Heng Ji
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Event schema depicts the typical structure of complex events, serving as a scaffolding to effectively analyze, predict, and possibly intervene in the ongoing events. To induce event schemas from historical events, previous work uses an event-by-event scheme, ignoring the global structure of the entire schema graph. We propose a new event schema induction framework using double graph autoencoders, which captures the global dependencies among nodes in event graphs. Specifically, we first extract the event skeleton from an event graph and design a variational directed acyclic graph (DAG) autoencoder to learn its global structure. Then we further fill in the event arguments for the skeleton, and use another Graph Convolutional Network (GCN) based autoencoder to reconstruct entity-entity relations as well as to detect coreferential entities. By performing this two-stage induction decomposition, the model can avoid reconstructing the entire graph in one step, allowing it to focus on learning global structures between events. Experimental results on three event graph datasets demonstrate that our method achieves state-of-the-art performance and induces high-quality event schemas with global consistency.

pdf bib
RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios
Xinya Du | Zixuan Zhang | Sha Li | Pengfei Yu | Hongwei Wang | Tuan Lai | Xudong Lin | Ziqi Wang | Iris Liu | Ben Zhou | Haoyang Wen | Manling Li | Darryl Hannan | Jie Lei | Hyounghun Kim | Rotem Dror | Haoyu Wang | Michael Regan | Qi Zeng | Qing Lyu | Charles Yu | Carl Edwards | Xiaomeng Jin | Yizhu Jiao | Ghazaleh Kazeminejad | Zhenhailong Wang | Chris Callison-Burch | Mohit Bansal | Carl Vondrick | Jiawei Han | Dan Roth | Shih-Fu Chang | Martha Palmer | Heng Ji
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations

We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios. The framework consists of two parts: (1) an open-domain end-to-end multimedia multilingual information extraction system with weak-supervision and zero-shot learningbased techniques. (2) schema matching and schema-guided event prediction based on our curated schema library. We build a demo website based on our dockerized system and schema library publicly available for installation ( We also include a video demonstrating the system.

pdf bib
New Frontiers of Information Extraction
Muhao Chen | Lifu Huang | Manling Li | Ben Zhou | Heng Ji | Dan Roth
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts

This tutorial targets researchers and practitioners who are interested in AI and ML technologies for structural information extraction (IE) from unstructured textual sources. Particularly, this tutorial will provide audience with a systematic introduction to recent advances of IE, by answering several important research questions. These questions include (i) how to develop an robust IE system from noisy, insufficient training data, while ensuring the reliability of its prediction? (ii) how to foster the generalizability of IE through enhancing the system’s cross-lingual, cross-domain, cross-task and cross-modal transferability? (iii) how to precisely support extracting structural information with extremely fine-grained, diverse and boundless labels? (iv) how to further improve IE by leveraging indirect supervision from other NLP tasks, such as NLI, QA or summarization, and pre-trained language models? (v) how to acquire knowledge to guide the inference of IE systems? We will discuss several lines of frontier research that tackle those challenges, and will conclude the tutorial by outlining directions for further investigation.


pdf bib
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Qingyun Wang | Manling Li | Xuan Wang | Nikolaus Parulian | Guangxing Han | Jiawei Ma | Jingxuan Tu | Ying Lin | Ranran Haoran Zhang | Weili Liu | Aabhas Chauhan | Yingjun Guan | Bangzheng Li | Ruisong Li | Xiangchen Song | Yi Fung | Heng Ji | Jiawei Han | Shih-Fu Chang | James Pustejovsky | Jasmine Rah | David Liem | Ahmed ELsayed | Martha Palmer | Clare Voss | Cynthia Schneider | Boyan Onyshkevych
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. All of the data, KGs, reports.

pdf bib
RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System
Haoyang Wen | Ying Lin | Tuan Lai | Xiaoman Pan | Sha Li | Xudong Lin | Ben Zhou | Manling Li | Haoyu Wang | Hongming Zhang | Xiaodong Yu | Alexander Dong | Zhenhailong Wang | Yi Fung | Piyush Mishra | Qing Lyu | Dídac Surís | Brian Chen | Susan Windisch Brown | Martha Palmer | Chris Callison-Burch | Carl Vondrick | Jiawei Han | Dan Roth | Shih-Fu Chang | Heng Ji
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video). The system advances state-of-the-art from two aspects: (1) extending from sentence-level event extraction to cross-document cross-lingual cross-media event extraction, coreference resolution and temporal event tracking; (2) using human curated event schema library to match and enhance the extraction output. We have made the dockerlized system publicly available for research purpose at GitHub, with a demo video.

pdf bib
GENE: Global Event Network Embedding
Qi Zeng | Manling Li | Tuan Lai | Heng Ji | Mohit Bansal | Hanghang Tong
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)

Current methods for event representation ignore related events in a corpus-level global context. For a deep and comprehensive understanding of complex events, we introduce a new task, Event Network Embedding, which aims to represent events by capturing the connections among events. We propose a novel framework, Global Event Network Embedding (GENE), that encodes the event network with a multi-view graph encoder while preserving the graph topology and node semantics. The graph encoder is trained by minimizing both structural and semantic losses. We develop a new series of structured probing tasks, and show that our approach effectively outperforms baseline models on node typing, argument role classification, and event coreference resolution.

pdf bib
The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction
Manling Li | Sha Li | Zhenhailong Wang | Lifu Huang | Kyunghyun Cho | Heng Ji | Jiawei Han | Clare Voss
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. In addition, we propose a Temporal Event Graph Model that predicts event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and we have manually constructed gold-standard schemas. Intrinsic evaluations by schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided future event prediction further demonstrates the predictive power of our event graph model, significantly outperforming human schemas and baselines by more than 17.8% on HITS@1.

pdf bib
Timeline Summarization based on Event Graph Compression via Time-Aware Optimal Transport
Manling Li | Tengfei Ma | Mo Yu | Lingfei Wu | Tian Gao | Heng Ji | Kathleen McKeown
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Timeline Summarization identifies major events from a news collection and describes them following temporal order, with key dates tagged. Previous methods generally generate summaries separately for each date after they determine the key dates of events. These methods overlook the events’ intra-structures (arguments) and inter-structures (event-event connections). Following a different route, we propose to represent the news articles as an event-graph, thus the summarization becomes compressing the whole graph to its salient sub-graph. The key hypothesis is that the events connected through shared arguments and temporal order depict the skeleton of a timeline, containing events that are semantically related, temporally coherent and structurally salient in the global event graph. A time-aware optimal transport distance is then introduced for learning the compression model in an unsupervised manner. We show that our approach significantly improves on the state of the art on three real-world datasets, including two public standard benchmarks and our newly collected Timeline100 dataset.

pdf bib
Joint Multimedia Event Extraction from Video and Article
Brian Chen | Xudong Lin | Christopher Thomas | Manling Li | Shoya Yoshida | Lovish Chum | Heng Ji | Shih-Fu Chang
Findings of the Association for Computational Linguistics: EMNLP 2021

Visual and textual modalities contribute complementary information about events described in multimedia documents. Videos contain rich dynamics and detailed unfoldings of events, while text describes more high-level and abstract concepts. However, existing event extraction methods either do not handle video or solely target video while ignoring other modalities. In contrast, we propose the first approach to jointly extract events from both video and text articles. We introduce the new task of Video MultiMedia Event Extraction and propose two novel components to build the first system towards this task. First, we propose the first self-supervised cross-modal event coreference model that can determine coreference between video events and text events without any manually annotated pairs. Second, we introduce the first cross-modal transformer architecture, which extracts structured event information from both videos and text documents. We also construct and will publicly release a new benchmark of video-article pairs, consisting of 860 video-article pairs with extensive annotations for evaluating methods on this task. Our experimental results demonstrate the effectiveness of our proposed method on our new benchmark dataset. We achieve 6.0% and 5.8% absolute F-score gain on multimodal event coreference resolution and multimedia event extraction.

pdf bib
Coreference by Appearance: Visually Grounded Event Coreference Resolution
Liming Wang | Shengyu Feng | Xudong Lin | Manling Li | Heng Ji | Shih-Fu Chang
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc. However, the events and entities depicting in different modalities may not be perfectly aligned and can be difficult to annotate, which makes the task especially challenging with little supervision available. To address the above issues, we propose a supervised model based on attention mechanism and an unsupervised model based on statistical machine translation, capable of learning the relative importance of modalities for event coreference resolution. Experiments on a video multimedia event dataset show that our multimodal models outperform text-only systems in event coreference resolution tasks. A careful analysis reveals that the performance gain of the multimodal model especially under unsupervised settings comes from better learning of visually salient events.

pdf bib
Event-Centric Natural Language Processing
Muhao Chen | Hongming Zhang | Qiang Ning | Manling Li | Heng Ji | Kathleen McKeown | Dan Roth
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts

This tutorial targets researchers and practitioners who are interested in AI technologies that help machines understand natural language text, particularly real-world events described in the text. These include methods to extract the internal structures of an event regarding its protagonist(s), participant(s) and properties, as well as external structures concerning memberships, temporal and causal relations of multiple events. This tutorial will provide audience with a systematic introduction of (i) knowledge representations of events, (ii) various methods for automated extraction, conceptualization and prediction of events and their relations, (iii) induction of event processes and properties, and (iv) a wide range of NLU and commonsense understanding tasks that benefit from aforementioned techniques. We will conclude the tutorial by outlining emerging research problems in this area.


pdf bib
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li | Alireza Zareian | Qi Zeng | Spencer Whitehead | Di Lu | Heng Ji | Shih-Fu Chang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce a new task, MultiMedia Event Extraction, which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

pdf bib
GAIA: A Fine-grained Multimedia Knowledge Extraction System
Manling Li | Alireza Zareian | Ying Lin | Xiaoman Pan | Spencer Whitehead | Brian Chen | Bo Wu | Heng Ji | Shih-Fu Chang | Clare Voss | Daniel Napierski | Marjorie Freedman
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology. Our system, GAIA, enables seamless search of complex graph queries, and retrieves multimedia evidence including text, images and videos. GAIA achieves top performance at the recent NIST TAC SM-KBP2019 evaluation. The system is publicly available at GitHub and DockerHub, with a narrated video that documents the system.

pdf bib
Connecting the Dots: Event Graph Schema Induction with Path Language Modeling
Manling Li | Qi Zeng | Ying Lin | Kyunghyun Cho | Heng Ji | Jonathan May | Nathanael Chambers | Clare Voss
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Event schemas can guide our understanding and ability to make predictions with respect to what might happen next. We propose a new Event Graph Schema, where two event types are connected through multiple paths involving entities that fill important roles in a coherent story. We then introduce Path Language Model, an auto-regressive language model trained on event-event paths, and select salient and coherent paths to probabilistically construct these graph schemas. We design two evaluation metrics, instance coverage and instance coherence, to evaluate the quality of graph schema induction, by checking when coherent event instances are covered by the schema graph. Intrinsic evaluations show that our approach is highly effective at inducing salient and coherent schemas. Extrinsic evaluations show the induced schema repository provides significant improvement to downstream end-to-end Information Extraction over a state-of-the-art joint neural extraction model, when used as additional global features to unfold instance graphs.


pdf bib
Multilingual Entity, Relation, Event and Human Value Extraction
Manling Li | Ying Lin | Joseph Hoover | Spencer Whitehead | Clare Voss | Morteza Dehghani | Heng Ji
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

This paper demonstrates a state-of-the-art end-to-end multilingual (English, Russian, and Ukrainian) knowledge extraction system that can perform entity discovery and linking, relation extraction, event extraction, and coreference. It extracts and aggregates knowledge elements across multiple languages and documents as well as provides visualizations of the results along three dimensions: temporal (as displayed in an event timeline), spatial (as displayed in an event heatmap), and relational (as displayed in entity-relation networks). For our system to further support users’ analyses of causal sequences of events in complex situations, we also integrate a wide range of human moral value measures, independently derived from region-based survey, into the event heatmap. This system is publicly available as a docker container and a live demo.

pdf bib
Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
Manling Li | Lingyu Zhang | Heng Ji | Richard J. Radke
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transcripts of natural, multi-person meetings differ significantly from documents like news articles, which can make Natural Language Generation models for generating summaries unfocused. We develop an abstractive meeting summarizer from both videos and audios of meeting recordings. Specifically, we propose a multi-modal hierarchical attention across three levels: segment, utterance and word. To narrow down the focus into topically-relevant segments, we jointly model topic segmentation and summarization. In addition to traditional text features, we introduce new multi-modal features derived from visual focus of attention, based on the assumption that the utterance is more important if the speaker receives more attention. Experiments show that our model significantly outperforms the state-of-the-art with both BLEU and ROUGE measures.