Huijia Zhu


2024

pdf bib
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang | Yi Tu | Yixi Zhao | Chenshu Yuan | Huan Chen | Yue Zhang | Mingxu Chai | Ya Guo | Huijia Zhu | Qi Zhang | Tao Gui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents.Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements.However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream tasks.To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the reading order annotation as relations over layout elements, together with a relation-extraction-based method that outperforms previous models. Moreover, we propose a reading-order-relation-enhancing pipeline to improve model performance on any arbitrary VrD task by introducing additional reading order relation inputs.We conduct comprehensive experiments to demonstrate that the pipeline generally benefits downstream VrD tasks:(1) with utilizing the reading order relation information, the enhanced downstream models achieve SOTA results on both two task settings of the targeted dataset; (2) with utilizing the pseudo reading order information generated by the proposed ROP model, the performance of the enhanced models has improved across all three models and eight cross-domain VrD-IE/QA task settings without targeted optimization.

pdf bib
XMC-Agent : Dynamic Navigation over Scalable Hierarchical Index for Incremental Extreme Multi-label Classification
Yanjiang Liu | Tianyun Zhong | Yaojie Lu | Hongyu Lin | Ben He | Shuheng Zhou | Huijia Zhu | Weiqiang Wang | Zhongyi Liu | Xianpei Han | Le Sun
Findings of the Association for Computational Linguistics: ACL 2024

The eXtreme Multi-label Classification (XMC) aims at accurately assigning large-scale labels to instances, and is challenging for learning, managing, and predicting over the large-scale and rapidly growing set of labels. Traditional XMC methods, like one-vs-all and tree-based methods struggle with the growing set of labels due to their static label assumptions, and embedding-based methods struggle with the complex mapping relationships due to their late-interaction paradigm. In this paper, we propose a large language model (LLM) powered agent framework for extreme multi-label classification – XMC-Agent, which can effectively learn, manage and predict the extremely large and dynamically increasing set of labels. Specifically, XMC-Agent models the extreme multi-label classification task as a dynamic navigation problem, employing a scalable hierarchical label index to effectively manage the unified label space. Additionally, we propose two algorithms to enhance the dynamic navigation capabilities of XMC-Agent: a self-construction algorithm for building the scalable hierarchical index, and an iterative feedback learning algorithm for adjusting the agent to specific tasks. Experiments show that XMC-Agentachieves the state-of-the-art performance on three standard datasets.

pdf bib
Debiasing In-Context Learning by Instructing LLMs How to Follow Demonstrations
Lvxue Li | Jiaqi Chen | Xinyu Lu | Yaojie Lu | Hongyu Lin | Shuheng Zhou | Huijia Zhu | Weiqiang Wang | Zhongyi Liu | Xianpei Han | Le Sun
Findings of the Association for Computational Linguistics: ACL 2024

In-context learning(ICL) has gained considerable attention due to its data efficiency and task adaptability. Unfortunately, ICL suffers from the demonstration bias, i.e., its performance and robustness are severely affected by the selection and ordering of demonstrations. In this paper, we identify that such demonstration bias may primarily stem from the semantic ambiguity induced by demonstrations, i.e., a demonstration may indicate multiple input-to-label mappings and its mapping can be interpreted differently in different contexts by LLMs. Such semantic ambiguity disrupts task comprehension during ICL and results in performance fluctuations. To resolve the semantic ambiguity problem, this paper further proposes two de-biasing strategies to mitigate demonstration bias in in-context learning. Experiments on six datasets show that our methods can effectively alleviate demonstration bias and significantly improve task performance.

pdf bib
Chain-of-Rewrite: Aligning Question and Documents for Open-Domain Question Answering
Chunlei Xin | Yaojie Lu | Hongyu Lin | Shuheng Zhou | Huijia Zhu | Weiqiang Wang | Zhongyi Liu | Xianpei Han | Le Sun
Findings of the Association for Computational Linguistics: EMNLP 2024

Despite the advancements made with the retrieve-then-read pipeline on open-domain question answering task, current methods still face challenges stemming from term mismatch and limited interaction between information retrieval systems and large language models. To mitigate these issues, we propose the Chain-of-Rewrite method, which leverages the guidance and feedback gained from the analysis to provide faithful and consistent extensions for effective question answering. Through a two-step rewriting process comprising Semantic Analysis and Semantic Augmentation, the Chain-of-Rewrite method effectively bridges the gap between the user question and relevant documents. By incorporating feedback from the rewriting process, our method can self-correct the retrieval and reading process to further improve the performance. Experiments on four open-domain question answering datasets demonstrate the effectiveness of our system under zero-shot settings.

pdf bib
Beyond Full Fine-tuning: Harnessing the Power of LoRA for Multi-Task Instruction Tuning
Chunlei Xin | Yaojie Lu | Hongyu Lin | Shuheng Zhou | Huijia Zhu | Weiqiang Wang | Zhongyi Liu | Xianpei Han | Le Sun
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Low-Rank Adaptation (LoRA) is a widespread parameter-efficient fine-tuning algorithm for large-scale language models. It has been commonly accepted that LoRA mostly achieves promising results in single-task, low-resource settings, and struggles to handle multi-task instruction tuning scenarios. In this paper, we conduct a systematic study of LoRA on diverse tasks and rich resources with different learning capacities, examining its performance on seen tasks during training and its cross-task generalization on unseen tasks. Our findings challenge the prevalent assumption that the limited learning capacity will inevitably result in performance decline. In fact, our study reveals that when configured with an appropriate rank, LoRA can achieve remarkable performance in high-resource and multi-task scenarios, even comparable to that achieved through full fine-tuning. It turns out that the constrained learning capacity encourages LoRA to prioritize conforming to instruction requirements rather than memorizing specialized features of particular tasks or instances. This study reveals the underlying connection between learning capacity and generalization capabilities for robust parameter-efficient fine-tuning, highlighting a promising direction for the broader application of LoRA across various tasks and settings.

pdf bib
Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models
Yichun Zhao | Shuheng Zhou | Huijia Zhu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Step-by-step reasoning methods, such as the Chain-of-Thought (CoT), have been demonstrated to be highly effective in harnessing the reasoning capabilities of Large Language Models (LLMs). Recent research efforts have sought to distill LLMs into Small Language Models (SLMs), with a significant focus on transferring the reasoning capabilities of LLMs to SLMs via CoT. However, the outcomes of CoT distillation are inadequate for knowledge-intensive reasoning tasks. This is because generating accurate rationales requires crucial factual knowledge, which SLMs struggle to retain due to their parameter constraints. We propose a retrieval-based CoT distillation framework, named Probe then Retrieve and Reason (PRR), which distills the question probing and reasoning capabilities from LLMs into SLMs. We train two complementary distilled SLMs, a probing model and a reasoning model, in tandem. When presented with a new question, the probing model first identifies the necessary knowledge to answer it, generating queries for retrieval. Subsequently, the reasoning model uses the retrieved knowledge to construct a step-by-step rationale for the answer. In knowledge-intensive reasoning tasks, such as StrategyQA and OpenbookQA, our distillation framework yields superior performance for SLMs compared to conventional methods, including simple CoT distillation and knowledge-augmented distillation using raw questions.

2023

pdf bib
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang | Ya Guo | Yi Tu | Huan Chen | Jinyang Tang | Huijia Zhu | Qi Zhang | Tao Gui
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

2022

pdf bib
TransAdv: A Translation-based Adversarial Learning Framework for Zero-Resource Cross-Lingual Named Entity Recognition
Yichun Zhao | Jintao Du | Gongshen Liu | Huijia Zhu
Findings of the Association for Computational Linguistics: EMNLP 2022

Zero-Resource Cross-Lingual Named Entity Recognition aims at training an NER model of the target language using only labeled source language data and unlabeled target language data. Existing methods are mainly divided into three categories: model transfer based, data transfer based and knowledge transfer based. Each method has its own disadvantages, and combining more than one of them often leads to better performance. However, the performance of data transfer based methods is often limited by inevitable noise in the translation process. To handle the problem, we propose a framework named TransAdv to mitigate lexical and syntactic errors of word-by-word translated data, better utilizing the data by multi-level adversarial learning and multi-model knowledge distillation. Extensive experiments are conducted over 6 target languages with English as the source language, and the results show that TransAdv achieves competitive performance to the state-of-the-art models.

pdf bib
A Multi-Task Dual-Tree Network for Aspect Sentiment Triplet Extraction
Yichun Zhao | Kui Meng | Gongshen Liu | Jintao Du | Huijia Zhu
Proceedings of the 29th International Conference on Computational Linguistics

Aspect Sentiment Triplet Extraction (ASTE) aims at extracting triplets from a given sentence, where each triplet includes an aspect, its sentiment polarity, and a corresponding opinion explaining the polarity. Existing methods are poor at detecting complicated relations between aspects and opinions as well as classifying multiple sentiment polarities in a sentence. Detecting unclear boundaries of multi-word aspects and opinions is also a challenge. In this paper, we propose a Multi-Task Dual-Tree Network (MTDTN) to address these issues. We employ a constituency tree and a modified dependency tree in two sub-tasks of Aspect Opinion Co-Extraction (AOCE) and ASTE, respectively. To enhance the information interaction between the two sub-tasks, we further design a Transition-Based Inference Strategy (TBIS) that transfers the boundary information from tags of AOCE to ASTE through a transition matrix. Extensive experiments are conducted on four popular datasets, and the results show the effectiveness of our model.

2009

pdf bib
Domain Adaptation with Latent Semantic Association for Named Entity Recognition
Honglei Guo | Huijia Zhu | Zhili Guo | Xiaoxun Zhang | Xian Wu | Zhong Su
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2006

pdf bib
Dependency Parsing Based on Dynamic Local Optimization
Ting Liu | Jinshan Ma | Huijia Zhu | Sheng Li
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)