Haitao Mi


2021

pdf bib
A Dialogue-based Information Extraction System for Medical Insurance Assessment
Shuang Peng | Mengdi Zhou | Minghui Yang | Haitao Mi | Shaosheng Cao | Zujie Wen | Teng Xu | Hongbin Wang | Lei Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling
Xiang Hu | Haitao Mi | Zujie Wen | Yafang Wang | Yi Su | Jing Zheng | Gerard de Melo
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. In this paper, we propose a recursive Transformer model based on differentiable CKY style binary trees to emulate this composition process, and we extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruning and growing algorithm to reduce the time complexity and enable encoding in linear time. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

2016

pdf bib
Vocabulary Manipulation for Neural Machine Translation
Haitao Mi | Zhiguo Wang | Abe Ittycheriah
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Sentence Similarity Learning by Lexical Decomposition and Composition
Zhiguo Wang | Haitao Mi | Abraham Ittycheriah
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Most conventional sentence similarity methods only focus on similar parts of two input sentences, and simply ignore the dissimilar parts, which usually give us some clues and semantic meanings about the sentences. In this work, we propose a model to take into account both the similarities and dissimilarities by decomposing and composing lexical semantics over sentences. The model represents each word as a vector, and calculates a semantic matching vector for each word based on all words in the other sentence. Then, each word vector is decomposed into a similar component and a dissimilar component based on the semantic matching vector. After this, a two-channel CNN model is employed to capture features by composing the similar and dissimilar components. Finally, a similarity score is estimated over the composed feature vectors. Experimental results show that our model gets the state-of-the-art performance on the answer sentence selection task, and achieves a comparable result on the paraphrase identification task.

pdf bib
Coverage Embedding Models for Neural Machine Translation
Haitao Mi | Baskaran Sankaran | Zhiguo Wang | Abe Ittycheriah
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Supervised Attentions for Neural Machine Translation
Haitao Mi | Zhiguo Wang | Abe Ittycheriah
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Semi-supervised Clustering for Short Text via Deep Representation Learning
Zhiguo Wang | Haitao Mi | Abraham Ittycheriah
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
Sense Embedding Learning for Word Sense Induction
Linfeng Song | Zhiguo Wang | Haitao Mi | Daniel Gildea
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2015

pdf bib
Feature Optimization for Constituent Parsing via Neural Networks
Zhiguo Wang | Haitao Mi | Nianwen Xue
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Shift-Reduce Constituency Parsing with Dynamic Programming and POS Tag Lattice
Haitao Mi | Liang Huang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
A Structured Language Model for Incremental Tree-to-String Translation
Heng Yu | Haitao Mi | Liang Huang | Qun Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Hierarchical MT Training using Max-Violation Perceptron
Kai Zhao | Liang Huang | Haitao Mi | Abe Ittycheriah
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
Martin Čmejrek | Haitao Mi | Bowen Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Max-Violation Perceptron and Forced Decoding for Scalable MT Training
Heng Yu | Liang Huang | Haitao Mi | Kai Zhao
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2011

pdf bib
A novel dependency-to-string model for statistical machine translation
Jun Xie | Haitao Mi | Qun Liu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bagging-based System Combination for Domain Adaption
Linfeng Song | Haitao Mi | Yajuan Lü | Qun Liu
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Rule Markov Models for Fast Tree-to-String Translation
Ashish Vaswani | Haitao Mi | Liang Huang | David Chiang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Constituency to Dependency Translation with Forests
Haitao Mi | Qun Liu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Learning Lexicalized Reordering Models from Reordering Graphs
Jinsong Su | Yang Liu | Yajuan Lv | Haitao Mi | Qun Liu
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
An Efficient Shift-Reduce Decoding Algorithm for Phrased-Based Machine Translation
Yang Feng | Haitao Mi | Yang Liu | Qun Liu
Coling 2010: Posters

pdf bib
Machine Translation with Lattices and Forests
Haitao Mi | Liang Huang | Qun Liu
Coling 2010: Posters

pdf bib
Dependency-Based Bracketing Transduction Grammar for Statistical Machine Translation
Jinsong Su | Yang Liu | Haitao Mi | Hongmei Zhao | Yajuan Lv | Qun Liu
Coling 2010: Posters

pdf bib
The ICT statistical machine translation system for IWSLT 2010
Hao Xiong | Jun Xie | Hui Yu | Kai Liu | Wei Luo | Haitao Mi | Yang Liu | Yajuan Lü | Qun Liu
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Statistical Translation Model Based On Source Syntax Structure
Qun Liu | Yang Liu | Haitao Mi
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Efficient Incremental Decoding for Tree-to-String Translation
Liang Huang | Haitao Mi
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
The ICT statistical machine translation system for the IWSLT 2009
Haitao Mi | Yang Li | Tian Xia | Xinyan Xiao | Yang Feng | Jun Xie | Hao Xiong | Zhaopeng Tu | Daqi Zheng | Yanjuan Lu | Qun Liu
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the ICT Statistical Machine Translation systems that used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2009. For this year’s evaluation, we participated in the Challenge Task (Chinese-English and English-Chinese) and BTEC Task (Chinese-English). And we mainly focus on one new method to improve single system’s translation quality. Specifically, we developed a sentence-similarity based development set selection technique. For each task, we finally submitted the single system who got the maximum BLEU scores on the selected development set. The four single translation systems are based on different techniques: a linguistically syntax-based system, two formally syntax-based systems and a phrase-based system. Typically, we didn’t use any rescoring or system combination techniques in this year’s evaluation.

pdf bib
Joint Decoding with Multiple Translation Models
Yang Liu | Haitao Mi | Yang Feng | Qun Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Sub-Sentence Division for Tree-Based Machine Translation
Hao Xiong | Wenwen Xu | Haitao Mi | Yang Liu | Qun Liu
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Lattice-based System Combination for Statistical Machine Translation
Yang Feng | Yang Liu | Haitao Mi | Qun Liu | Yajuan Lü
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Wenbin Jiang | Haitao Mi | Qun Liu
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Forest-Based Translation
Haitao Mi | Liang Huang | Qun Liu
Proceedings of ACL-08: HLT

pdf bib
Forest-based Translation Rule Extraction
Haitao Mi | Liang Huang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Refinements in BTG-based Statistical Machine Translation
Deyi Xiong | Min Zhang | AiTi Aw | Haitao Mi | Qun Liu | Shouxun Lin
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
The ICT system description for IWSLT 2008.
Yang Liu | Zhongjun He | Haitao Mi | Yun Huang | Yang Feng | Wenbin Jiang | Yajuan Lu | Qun Liu
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents a description for the ICT systems involved in the IWSLT 2008 evaluation campaign. This year, we participated in Chinese-English and English-Chinese translation directions. Four statistical machine translation systems were used: one linguistically syntax-based, two formally syntax-based, and one phrase-based. The outputs of the four SMT systems were fed to a sentence-level system combiner, which was expected to produce better translations than single systems. We will report the results of the four single systems and the combiner on both the development and test sets.

2007

pdf bib
The ICT statistical machine translation systems for IWSLT 2007
Zhongjun He | Haitao Mi | Yang Liu | Deyi Xiong | Weihua Luo | Yun Huang | Zhixiang Ren | Yajuan Lu | Qun Liu
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, we give an overview of the ICT statistical machine translation systems for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007. In this year’s evaluation, we participated in the Chinese-English transcript translation task, and developed three systems based on different techniques: a formally syntax-based system Bruin, an extended phrase-based system Confucius and a linguistically syntax-based system Lynx. We will describe the models of these three systems, and compare their performance in detail. We set Bruin as our primary system, which ranks 2 among the 15 primary results according to the official evaluation results.