Huan Chen


2024

pdf bib
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang | Yi Tu | Yixi Zhao | Chenshu Yuan | Huan Chen | Yue Zhang | Mingxu Chai | Ya Guo | Huijia Zhu | Qi Zhang | Tao Gui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents.Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements.However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream tasks.To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the reading order annotation as relations over layout elements, together with a relation-extraction-based method that outperforms previous models. Moreover, we propose a reading-order-relation-enhancing pipeline to improve model performance on any arbitrary VrD task by introducing additional reading order relation inputs.We conduct comprehensive experiments to demonstrate that the pipeline generally benefits downstream VrD tasks:(1) with utilizing the reading order relation information, the enhanced downstream models achieve SOTA results on both two task settings of the targeted dataset; (2) with utilizing the pseudo reading order information generated by the proposed ROP model, the performance of the enhanced models has improved across all three models and eight cross-domain VrD-IE/QA task settings without targeted optimization.

pdf bib
Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs
Kexin Ma | Ruochun Jin | Wang Haotian | Wang Xi | Huan Chen | Yuhua Tang | Qian Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

Retrieval-Augmented Large Language Models(RALMs) have made significant strides in enhancing the accuracy of generated responses. However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods. We propose to boost the precision of RALMs’ answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts. Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality. Experiments demonstrate average improvement of 3.75% in accuracy on challenging open-domain question-answering tasks. Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs’ data quality and retrieval precision jointly.

pdf bib
Persuading across Diverse Domains: a Dataset and Persuasion Large Language Model
Chuhao Jin | Kening Ren | Lingzhen Kong | Xiting Wang | Ruihua Song | Huan Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Persuasive dialogue requires multi-turn following and planning abilities to achieve the goal of persuading users, which is still challenging even for state-of-the-art large language models (LLMs). Previous works focus on retrieval-based models or generative models in a specific domain due to a lack of data across multiple domains. In this paper, we leverage GPT-4 to create the first multi-domain persuasive dialogue dataset DailyPersuasion. Then we propose a general method named PersuGPT to learn a persuasion model based on LLMs through intent-to-strategy reasoning, which summarizes the intent of user’s utterance and reasons next strategy to respond. Moreover, we design a simulation-based preference optimization, which utilizes a learned user model and our model to simulate next turns and estimate their rewards more accurately. Experimental results on two datasets indicate that our proposed method outperforms all baselines in terms of automatic evaluation metric Win-Rate and human evaluation. The code and data are available at https://persugpt.github.io.

2023

pdf bib
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Yi Tu | Ya Guo | Huan Chen | Jinyang Tang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Visually-rich Document Understanding (VrDU) has attracted much research attention over the past years. Pre-trained models on a large number of document images with transformer-based backbones have led to significant performance gains in this field. The major challenge is how to fusion the different modalities (text, layout, and image) of the documents in a unified model with different pre-training tasks. This paper focuses on improving text-layout interactions and proposes a novel multi-modal pre-training model, LayoutMask. LayoutMask uses local 1D position, instead of global 1D position, as layout input and has two pre-training objectives: (1) Masked Language Modeling: predicting masked tokens with two novel masking strategies; (2) Masked Position Modeling: predicting masked 2D positions to improve layout representation learning. LayoutMask can enhance the interactions between text and layout modalities in a unified model and produce adaptive and robust multi-modal representations for downstream tasks. Experimental results show that our proposed method can achieve state-of-the-art results on a wide variety of VrDU problems, including form understanding, receipt understanding, and document image classification.

pdf bib
Joint Semantic and Strategy Matching for Persuasive Dialogue
Chuhao Jin | Yutao Zhu | Lingzhen Kong | Shijie Li | Xiao Zhang | Ruihua Song | Xu Chen | Huan Chen | Yuchong Sun | Yu Chen | Jun Xu
Findings of the Association for Computational Linguistics: EMNLP 2023

Persuasive dialogue aims to persuade users to achieve some targets by conversations. While previous persuasion models have achieved notable successes, they mostly base themselves on utterance semantic matching, and an important aspect has been ignored, that is, the strategy of the conversations, for example, the agent can choose an emotional-appeal strategy to impress users. Compared with utterance semantics, conversation strategies are high-level concepts, which can be informative and provide complementary information to achieve effective persuasions. In this paper, we propose to build a persuasion model by jointly modeling the conversation semantics and strategies, where we design a BERT-like module and an auto-regressive predictor to match the semantics and strategies, respectively. Experimental results indicate that our proposed approach can significantly improve the state-of-the-art baseline by 5% on a small dataset and 37% on a large dataset in terms of Recall@1. Detailed analyses show that the auto-regressive predictor contributes most to the final performance.

pdf bib
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang | Ya Guo | Yi Tu | Huan Chen | Jinyang Tang | Huijia Zhu | Qi Zhang | Tao Gui
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

2021

pdf bib
An Emotional Comfort Framework for Improving User Satisfaction in E-Commerce Customer Service Chatbots
Shuangyong Song | Chao Wang | Haiqing Chen | Huan Chen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

E-commerce has grown substantially over the last several years, and chatbots for intelligent customer service are concurrently drawing attention. We presented AliMe Assist, a Chinese intelligent assistant designed for creating an innovative online shopping experience in E-commerce. Based on question answering (QA), AliMe Assist offers assistance service, customer service, and chatting service. According to the survey of user studies and the real online testing, emotional comfort of customers’ negative emotions, which make up more than 5% of whole number of customer visits on AliMe, is a key point for providing considerate service. In this paper, we propose a framework to obtain proper answer to customers’ emotional questions. The framework takes emotion classification model as a core, and final answer selection is based on topic classification and text matching. Our experiments on real online systems show that the framework is very promising.

pdf bib
Geo-BERT Pre-training Model for Query Rewriting in POI Search
Xiao Liu | Juan Hu | Qi Shen | Huan Chen
Findings of the Association for Computational Linguistics: EMNLP 2021

Query Rewriting (QR) is proposed to solve the problem of the word mismatch between queries and documents in Web search. Existing approaches usually model QR with an end-to-end sequence-to-sequence (seq2seq) model. The state-of-the-art Transformer-based models can effectively learn textual semantics from user session logs, but they often ignore users’ geographic location information that is crucial for the Point-of-Interest (POI) search of map services. In this paper, we proposed a pre-training model, called Geo-BERT, to integrate semantics and geographic information in the pre-trained representations of POIs. Firstly, we simulate POI distribution in the real world as a graph, in which nodes represent POIs and multiple geographic granularities. Then we use graph representation learning methods to get geographic representations. Finally, we train a BERT-like pre-training model with text and POIs’ graph embeddings to get an integrated representation of both geographic and semantic information, and apply it in the QR of POI search. The proposed model achieves excellent accuracy on a wide range of real-world datasets of map services.

2013

pdf bib
Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
Qi Zhang | Jin Qian | Huan Chen | Jihua Kang | Xuanjing Huang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Chinese Named Entity Abbreviation Generation Using First-Order Logic
Huan Chen | Qi Zhang | Jin Qian | Xuanjing Huang
Proceedings of the Sixth International Joint Conference on Natural Language Processing