Yuhang Guo


2022

pdf bib
The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
Bao Guo | Mengge Liu | Wen Zhang | Hexuan Chen | Chang Mu | Xiang Li | Jianwei Cui | Bin Wang | Yuhang Guo
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of extttwait-k strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2 extasciitilde3 BLEU scores under different latency regimes.

2021

pdf bib
BIT’s system for AutoSimulTrans2021
Mengge Liu | Shuoying Chen | Minqin Li | Zhipeng Wang | Yuhang Guo
Proceedings of the Second Workshop on Automatic Simultaneous Translation

In this paper we introduce our Chinese-English simultaneous translation system participating in AutoSimulTrans2021. In simultaneous translation, translation quality and delay are both important. In order to reduce the translation delay, we cut the streaming-input source sentence into segments and translate the segments before the full sentence is received. In order to obtain high-quality translations, we pre-train a translation model with adequate corpus and fine-tune the model with domain adaptation and sentence length adaptation. The experimental results on the evaluation data show that our system performs better than the baseline system.

2020

pdf bib
BIT’s system for the AutoSimTrans 2020
Minqin Li | Haodong Cheng | Yuanjie Wang | Sijia Zhang | Liting Wu | Yuhang Guo
Proceedings of the First Workshop on Automatic Simultaneous Translation

This paper describes our machine translation systems for the streaming Chinese-to-English translation task of AutoSimTrans 2020. We present a sentence length based method and a sentence boundary detection model based method for the streaming input segmentation. Experimental results of the transcription and the ASR output translation on the development data sets show that the translation system with the detection model based method outperforms the one with the length based method in BLEU score by 1.19 and 0.99 respectively under similar or better latency.

2017

pdf bib
A Parallel Recurrent Neural Network for Language Modeling with POS Tags
Chao Su | Heyan Huang | Shumin Shi | Yuhang Guo | Hao Wu
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity
Hao Wu | Heyan Huang | Ping Jian | Yuhang Guo | Chao Su
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper presents three systems for semantic textual similarity (STS) evaluation at SemEval-2017 STS task. One is an unsupervised system and the other two are supervised systems which simply employ the unsupervised one. All our systems mainly depend on the (SIS), which is constructed based on the semantic hierarchical taxonomy in WordNet, to compute non-overlapping information content (IC) of sentences. Our team ranked 2nd among 31 participating teams by the primary score of Pearson correlation coefficient (PCC) mean of 7 tracks and achieved the best performance on Track 1 (AR-AR) dataset.

2013

pdf bib
Microblog Entity Linking by Leveraging Extra Posts
Yuhang Guo | Bing Qin | Ting Liu | Sheng Li
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2011

pdf bib
A Graph-based Method for Entity Linking
Yuhang Guo | Wanxiang Che | Ting Liu | Sheng Li
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation
Yuhang Guo | Wanxiang Che | Wei He | Ting Liu | Sheng Li
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Multilingual Dependency-based Syntactic and Semantic Parsing
Wanxiang Che | Zhenghua Li | Yongqiang Li | Yuhang Guo | Bing Qin | Ting Liu
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2007

pdf bib
HIT-IR-WSD: A WSD System for English Lexical Sample Task
Yuhang Guo | Wanxiang Che | Yuxuan Hu | Wei Zhang | Ting Liu
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)