Yifan He


2021

pdf bib
damo_nlp at MEDIQA 2021: Knowledge-based Preprocessing and Coverage-oriented Reranking for Medical Question Summarization
Yifan He | Mosha Chen | Songfang Huang
Proceedings of the 20th Workshop on Biomedical Language Processing

Medical question summarization is an important but difficult task, where the input is often complex and erroneous while annotated data is expensive to acquire. We report our participation in the MEDIQA 2021 question summarization task in which we are required to address these challenges. We start from pre-trained conditional generative language models, use knowledge bases to help correct input errors, and rerank single system outputs to boost coverage. Experimental results show significant improvement in string-based metrics.

2020

pdf bib
Proceedings of Workshop on Natural Language Processing in E-Commerce
Huasha Zhao | Parikshit Sondhi | Nguyen Bach | Sanjika Hewavitharana | Yifan He | Luo Si | Heng Ji
Proceedings of Workshop on Natural Language Processing in E-Commerce

2016

pdf bib
Entity Linking with a Paraphrase Flavor
Maria Pershina | Yifan He | Ralph Grishman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The task of Named Entity Linking is to link entity mentions in the document to their correct entries in a knowledge base and to cluster NIL mentions. Ambiguous, misspelled, and incomplete entity mention names are the main challenges in the linking process. We propose a novel approach that combines two state-of-the-art models ― for entity disambiguation and for paraphrase detection ― to overcome these challenges. We consider name variations as paraphrases of the same entity mention and adopt a paraphrase model for this task. Our approach utilizes a graph-based disambiguation model based on Personalized Page Rank, and then refines and clusters its output using the paraphrase similarity between entity mention strings. It achieves a competitive performance of 80.5% in B3+F clustering score on diagnostic TAC EDL 2014 data.

pdf bib
The Interaction between SFP-Ne and SpOAs in Mandarin Chinese–A corpus based approach
Yifan He
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Posters

2015

pdf bib
Idiom Paraphrases: Seventh Heaven vs Cloud Nine
Maria Pershina | Yifan He | Ralph Grishman
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

pdf bib
Jointly Embedding Relations and Mentions for Knowledge Population
Miao Fan | Kai Cao | Yifan He | Ralph Grishman
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Personalized Page Rank for Named Entity Disambiguation
Maria Pershina | Yifan He | Ralph Grishman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
ICE: Rapid Information Extraction Customization for NLP Novices
Yifan He | Ralph Grishman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf bib
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language
Adam Meyers | Yifan He | Ralph Grishman
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language

pdf bib
Jargon-Term Extraction by Chunking
Adam Meyers | Zachary Glass | Angus Grieve-Smith | Yifan He | Shasha Liao | Ralph Grishman
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language

pdf bib
Corpus and Method for Identifying Citations in Non-Academic Text
Yifan He | Adam Meyers
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We attempt to identify citations in non-academic text such as patents. Unlike academic articles which often provide bibliographies and follow consistent citation styles, non-academic text cites scientific research in a more ad-hoc manner. We manually annotate citations in 50 patents, train a CRF classifier to find new citations, and apply a reranker to incorporate non-local information. Our best system achieves 0.83 F-score on 5-fold cross validation.

pdf bib
Annotating Relations in Scientific Articles
Adam Meyers | Giancarlo Lee | Angus Grieve-Smith | Yifan He | Harriet Taber
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Relations (ABBREVIATE, EXEMPLIFY, ORIGINATE, REL_WORK, OPINION) between entities (citations, jargon, people, organizations) are annotated for PubMed scientific articles. We discuss our specifications, pre-processing and evaluation

2013

pdf bib
Towards Fine-grained Citation Function Classification
Xiang Li | Yifan He | Adam Meyers | Ralph Grishman
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib
An Evaluation of Statistical Post-Editing Systems Applied to RBMT and SMT Systems
Hanna Béchara | Raphaël Rubino | Yifan He | Yanjun Ma | Josef van Genabith
Proceedings of COLING 2012

pdf bib
Combining Multiple Alignments to Improve Machine Translation
Zhaopeng Tu | Yang Liu | Yifan He | Josef van Genabith | Qun Liu | Shouxun Lin
Proceedings of COLING 2012: Posters

pdf bib
Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification
Zhaopeng Tu | Yifan He | Jennifer Foster | Josef van Genabith | Qun Liu | Shouxun Lin
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Maximum Rank Correlation Training for Statistical Machine Translation
Daqi Zheng | Yifan He | Yang Liu | Qun Liu
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Rich Linguistic Features for Translation Memory-Inspired Consistent Translation
Yifan He | Yanjun Ma | Andy Way | Josef van Genabith
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
From the Confidence Estimation of Machine Translation to the Integration of MT and Translation Memory
Yanjun Ma | Yifan He | Josef van Genabith
Proceedings of Machine Translation Summit XIII: Tutorial Abstracts

In this tutorial, we cover techniques that facilitate the integration of Machine Translation (MT) and Translation Memory (TM), which can help the adoption of MT technology in localisation industry. The tutorial covers four parts: i) brief introduction of MT and TM systems, ii) MT confidence estimation measures tailored for the TM environment, iii) segment-level MT and MT integration, iv) sub-segment level MT and TM integration, and v) human evaluation of MT and TM integration. We will first briefly describe and compare how translations are generated in MT and TM systems, and suggest possible avenues to combines these two systems. We will also cover current quality / cost estimation measures applied in MT and TM systems, such as the fuzzy-match score in the TM, and the evaluation/confidence metrics used to judge MT outputs. We then move on to introduce the recent developments in the field of MT confidence estimation tailored towards predicting post-editing efforts. We will especially focus on the confidence metrics proposed by Specia et al., which is shown to have high correlation with human preference, as well as post-editing time. For segment-level MT and TM integration, we present translation recommendation and translation re-ranking models, where the integration happens at the 1-best or the N-best level, respectively. Given an input to be translated, MT-TM recommendation compares the output from the MT and the TM systems, and presents the better one to the post-editor. MT-TM re-ranking, on the other hand, combines k-best lists from both systems, and generates a new list according to estimated post-editing effort. We observe high precision of these models in automatic and human evaluations, indicating that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate. For sub-segment level MT and TM integration, we try to reuse high quality TM chunks to improve the quality of MT systems. We can also predict whether phrase pairs derived from fuzzy matches should be used to constrain the translation of an input segment. Using a series of linguistically- motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, as is measured by automatic evaluation scores. Finally, we present several methodologies that can be used to track post-editing effort, perform human evaluation of MT-TM integration, or help translators to access MT outputs in a TM environment.

pdf bib
Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
Yanjun Ma | Yifan He | Andy Way | Josef van Genabith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
The DCU Dependency-Based Metric in WMT-MetricsMATR 2010
Yifan He | Jinhua Du | Andy Way | Josef van Genabith
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Bridging SMT and TM with Translation Recommendation
Yifan He | Yanjun Ma | Josef van Genabith | Andy Way
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Marianna Apidianaki | Yifan He
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf bib
Improving the Post-Editing Experience using Translation Recommendation: A User Study
Yifan He | Yanjun Ma | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’ comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry.

pdf bib
Integrating N-best SMT Outputs into a TM System
Yifan He | Yanjun Ma | Andy Way | Josef van Genabith
Coling 2010: Posters

2009

pdf bib
Improving the Objective Function in Minimum Error Rate Training
Yifan He | Andy Way
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Learning Labelled Dependencies in Machine Translation Evaluation
Yifan He | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Capturing Lexical Variation in MT Evaluation Using Automatically Built Sense-Cluster Inventories
Marianna Apidianaki | Yifan He | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
MATREX: The DCU MT System for WMT 2009
Jinhua Du | Yifan He | Sergio Penkale | Andy Way
Proceedings of the Fourth Workshop on Statistical Machine Translation