Sijia Wang


2024

pdf bib
Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
Barry Yao | Sijia Wang | Yu Chen | Qifan Wang | Minqian Liu | Zhiyang Xu | Licheng Yu | Lifu Huang
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose attribute-aware multimodal entity linking, where the input consists of a mention described with a text paragraph and images, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also accompanied by a text description, visual images, and a collection of attributes that present the meta-information of the entity in a structured format. To facilitate this research endeavor, we construct Ameli, encompassing a new multimodal entity linking benchmark dataset that contains 16,735 mentions described in text and associated with 30,472 images, and a multimodal knowledge base that covers 34,690 entities along with 177,873 entity images and 798,216 attributes. To establish baseline performance on Ameli, we experiment with several state-of-the-art architectures for multimodal entity linking and further propose a new approach that incorporates attributes of entities into disambiguation. Experimental results and extensive qualitative analysis demonstrate that extracting and understanding the attributes of mentions from their text descriptions and visual images play a vital role in multimodal entity linking. To the best of our knowledge, we are the first to integrate attributes in the multimodal entity linking task. The programs, model checkpoints, and the dataset are publicly available at https://github.com/VT-NLP/Ameli.

2023

pdf bib
The Art of Prompting: Event Detection based on Type Specific Prompts
Sijia Wang | Mo Yu | Lifu Huang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We compare various forms of prompts to represent event types and develop a unified framework to incorporate the event type specific prompts for supervised, few-shot, and zero-shot event detection. The experimental results demonstrate that a well-defined and comprehensive event type prompt can significantly improve event detection performance, especially when the annotated data is scarce (few-shot event detection) or not available (zero-shot event detection). By leveraging the semantics of event types, our unified framework shows up to 22.2% F-score gain over the previous state-of-the-art baselines.

pdf bib
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang | Alexander Hanbo Li | Henghui Zhu | Sheng Zhang | Pramuditha Perera | Chung-Wei Hang | Jie Ma | William Yang Wang | Zhiguo Wang | Vittorio Castelli | Bing Xiang | Patrick Ng
Findings of the Association for Computational Linguistics: ACL 2023

Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training GDMM with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenge of DMEL, facilitating future researches on this task.

2022

pdf bib
Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding
Sijia Wang | Mo Yu | Shiyu Chang | Lichao Sun | Lifu Huang
Findings of the Association for Computational Linguistics: ACL 2022

Event extraction is typically modeled as a multi-class classification problem where event types and argument roles are treated as atomic symbols. These approaches are usually limited to a set of pre-defined types. We propose a novel event extraction framework that uses event types and argument roles as natural language queries to extract candidate triggers and arguments from the input text. With the rich semantics in the queries, our framework benefits from the attention mechanisms to better capture the semantic correlation between the event types or argument roles and the input text. Furthermore, the query-and-extract formulation allows our approach to leverage all available event annotations from various ontologies as a unified model. Experiments on ACE and ERE demonstrate that our approach achieves state-of-the-art performance on each dataset and significantly outperforms existing methods on zero-shot event extraction.