Haotian Zhang


2020

pdf bib
Recurrent Inference in Text Editing
Ning Shi | Ziheng Zeng | Haotian Zhang | Yichen Gong
Findings of the Association for Computational Linguistics: EMNLP 2020

In neural text editing, prevalent sequence-to-sequence based approaches directly map the unedited text either to the edited text or the editing operations, in which the performance is degraded by the limited source text encoding and long, varying decoding steps. To address this problem, we propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space. In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit. For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC). Extensive experiments on these tasks with varying difficulties demonstrate that Recurrence achieves improvements over conventional inference methods.

2019

pdf bib
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
Zeynep Akkalyoncu Yilmaz | Wei Yang | Haotian Zhang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper applies BERT to ad hoc document retrieval on news articles, which requires addressing two challenges: relevance judgments in existing test collections are typically provided only at the document level, and documents often exceed the length that BERT was designed to handle. Our solution is to aggregate sentence-level evidence to rank documents. Furthermore, we are able to leverage passage-level relevance judgments fortuitously available in other domains to fine-tune BERT models that are able to capture cross-domain notions of relevance, and can be directly used for ranking news articles. Our simple neural ranking models achieve state-of-the-art effectiveness on three standard test collections.

pdf bib
Applying BERT to Document Retrieval with Birch
Zeynep Akkalyoncu Yilmaz | Shengjin Wang | Wei Yang | Haotian Zhang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections. Birch implements simple ranking models that achieve state-of-the-art effectiveness on standard TREC newswire and social media test collections. This demonstration focuses on technical challenges in the integration of NLP and IR capabilities, along with the design rationale behind our approach to tightly-coupled integration between Python (to support neural networks) and the Java Virtual Machine (to support document retrieval using the open-source Lucene search library). We demonstrate integration of Birch with an existing search interface as well as interactive notebooks that highlight its capabilities in an easy-to-understand manner.

2015

pdf bib
Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings
Luchen Tan | Haotian Zhang | Charles Clarke | Mark Smucker
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)