Qi Zheng


2024

pdf bib
DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing
Hangdi Xing | Changxu Cheng | Feiyu Gao | Zirui Shao | Zhi Yu | Jiajun Bu | Qi Zheng | Cong Yao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Parsing documents from pixels, such as pictures and scanned PDFs, into hierarchical structures is extensively demanded in the daily routines of data storage, retrieval and understanding. However, previously the research on this topic has been largely hindered since most existing datasets are small-scale, or contain documents of only a single type, which are characterized by a lack of document diversity. Moreover, there is a significant discrepancy in the annotation standards across datasets. In this paper, we introduce a large and diverse document hierarchy parsing (DHP) dataset to compensate for the data scarcity and inconsistency problem. We aim to set a new standard as a more practical, long-standing benchmark. Meanwhile, we present a new DHP framework designed to grasp both fine-grained text content and coarse-grained pattern at layout element level, enhancing the capacity of pre-trained text-layout models in handling the multi-page and multi-level challenges in DHP. Through exhaustive experiments, we validate the effectiveness of our proposed dataset and method.

2023

pdf bib
GEM: Gestalt Enhanced Markup Language Model for Web Understanding via Render Tree
Zirui Shao | Feiyu Gao | Zhongda Qi | Hangdi Xing | Jiajun Bu | Zhi Yu | Qi Zheng | Xiaozhong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Inexhaustible web content carries abundant perceptible information beyond text. Unfortunately, most prior efforts in pre-trained Language Models (LMs) ignore such cyber-richness, while few of them only employ plain HTMLs, and crucial information in the rendered web, such as visual, layout, and style, are excluded. Intuitively, those perceptible web information can provide essential intelligence to facilitate content understanding tasks. This study presents an innovative Gestalt Enhanced Markup (GEM) Language Model inspired by Gestalt psychological theory for hosting heterogeneous visual information from the render tree into the language model without requiring additional visual input. Comprehensive experiments on multiple downstream tasks, i.e., web question answering and web information extraction, validate GEM superiority.

2022

pdf bib
Understanding Gender Bias in Knowledge Base Embeddings
Yupei Du | Qi Zheng | Yuanbin Wu | Man Lan | Yan Yang | Meirong Ma
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge base (KB) embeddings have been shown to contain gender biases. In this paper, we study two questions regarding these biases: how to quantify them, and how to trace their origins in KB? Specifically, first, we develop two novel bias measures respectively for a group of person entities and an individual person entity. Evidence of their validity is observed by comparison with real-world census data. Second, we use the influence function to inspect the contribution of each triple in KB to the overall group bias. To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.

2020

pdf bib
Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents
Chuwei Luo | Yongpan Wang | Qi Zheng | Liangchen Li | Feiyu Gao | Shiyu Zhang
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Named entity recognition (NER) from visual documents, such as invoices, receipts or business cards, is a critical task for visual document understanding. Most classical approaches use a sequence-based model (typically BiLSTM-CRF framework) without considering document structure. Recent work on graph-based model using graph convolutional networks to encode visual and textual features have achieved promising performance on the task. However, few attempts take geometry information of text segments (text in bounding box) in visual documents into account. Meanwhile, existing methods do not consider that related text segments which need to be merged to form a complete entity in many real-world situations. In this paper, we present GraphNEMR, a graph-based model that uses graph convolutional networks to jointly merge text segments and recognize named entities. By incorporating geometry information from visual documents into our model, richer 2D context information is generated to improve document representations. To merge text segments, we introduce a novel mechanism that captures both geometry information as well as semantic information based on pre-trained language model. Experimental results show that the proposed GraphNEMR model outperforms both sequence-based and graph-based SOTA methods significantly.