Wenjie Liu


2021

pdf bib
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Yichong Leng | Xu Tan | Rui Wang | Linchen Zhu | Jin Xu | Wenjie Liu | Linquan Liu | Xiang-Yang Li | Tao Qin | Edward Lin | Tie-Yan Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.

2018

pdf bib
ITNLP-ARC at SemEval-2018 Task 12: Argument Reasoning Comprehension with Attention
Wenjie Liu | Chengjie Sun | Lei Lin | Bingquan Liu
Proceedings of the 12th International Workshop on Semantic Evaluation

Reasoning is a very important topic and has many important applications in the field of natural language processing. Semantic Evaluation (SemEval) 2018 Task 12 “The Argument Reasoning Comprehension” committed to research natural language reasoning. In this task, we proposed a novel argument reasoning comprehension system, ITNLP-ARC, which use Neural Networks technology to solve this problem. In our system, the LSTM model is involved to encode both the premise sentences and the warrant sentences. The attention model is used to merge the two premise sentence vectors. Through comparing the similarity between the attention vector and each of the two warrant vectors, we choose the one with higher similarity as our system’s final answer.

2017

pdf bib
ITNLP-AiKF at SemEval-2017 Task 1: Rich Features Based SVR for Semantic Textual Similarity Computing
Wenjie Liu | Chengjie Sun | Lei Lin | Bingquan Liu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Semantic Textual Similarity (STS) devotes to measuring the degree of equivalence in the underlying semantic of the sentence pair. We proposed a new system, ITNLP-AiKF, which applies in the SemEval 2017 Task1 Semantic Textual Similarity track 5 English monolingual pairs. In our system, rich features are involved, including Ontology based, word embedding based, Corpus based, Alignment based and Literal based feature. We leveraged the features to predict sentence pair similarity by a Support Vector Regression (SVR) model. In the result, a Pearson Correlation of 0.8231 is achieved by our system, which is a competitive result in the contest of this track.