Huijia Zhu


2023

pdf bib
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang | Ya Guo | Yi Tu | Huan Chen | Jinyang Tang | Huijia Zhu | Qi Zhang | Tao Gui
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

2022

pdf bib
TransAdv: A Translation-based Adversarial Learning Framework for Zero-Resource Cross-Lingual Named Entity Recognition
Yichun Zhao | Jintao Du | Gongshen Liu | Huijia Zhu
Findings of the Association for Computational Linguistics: EMNLP 2022

Zero-Resource Cross-Lingual Named Entity Recognition aims at training an NER model of the target language using only labeled source language data and unlabeled target language data. Existing methods are mainly divided into three categories: model transfer based, data transfer based and knowledge transfer based. Each method has its own disadvantages, and combining more than one of them often leads to better performance. However, the performance of data transfer based methods is often limited by inevitable noise in the translation process. To handle the problem, we propose a framework named TransAdv to mitigate lexical and syntactic errors of word-by-word translated data, better utilizing the data by multi-level adversarial learning and multi-model knowledge distillation. Extensive experiments are conducted over 6 target languages with English as the source language, and the results show that TransAdv achieves competitive performance to the state-of-the-art models.

pdf bib
A Multi-Task Dual-Tree Network for Aspect Sentiment Triplet Extraction
Yichun Zhao | Kui Meng | Gongshen Liu | Jintao Du | Huijia Zhu
Proceedings of the 29th International Conference on Computational Linguistics

Aspect Sentiment Triplet Extraction (ASTE) aims at extracting triplets from a given sentence, where each triplet includes an aspect, its sentiment polarity, and a corresponding opinion explaining the polarity. Existing methods are poor at detecting complicated relations between aspects and opinions as well as classifying multiple sentiment polarities in a sentence. Detecting unclear boundaries of multi-word aspects and opinions is also a challenge. In this paper, we propose a Multi-Task Dual-Tree Network (MTDTN) to address these issues. We employ a constituency tree and a modified dependency tree in two sub-tasks of Aspect Opinion Co-Extraction (AOCE) and ASTE, respectively. To enhance the information interaction between the two sub-tasks, we further design a Transition-Based Inference Strategy (TBIS) that transfers the boundary information from tags of AOCE to ASTE through a transition matrix. Extensive experiments are conducted on four popular datasets, and the results show the effectiveness of our model.

2009

pdf bib
Domain Adaptation with Latent Semantic Association for Named Entity Recognition
Honglei Guo | Huijia Zhu | Zhili Guo | Xiaoxun Zhang | Xian Wu | Zhong Su
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2006

pdf bib
Dependency Parsing Based on Dynamic Local Optimization
Ting Liu | Jinshan Ma | Huijia Zhu | Sheng Li
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)