Fangzhong Su


2012

pdf bib
ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
Mārcis Pinnis | Radu Ion | Dan Ştefănescu | Fangzhong Su | Inguna Skadiņa | Andrejs Vasiļjevs | Bogdan Babych
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Measuring Comparability of Documents in Non-Parallel Corpora for Efficient Extraction of (Semi-)Parallel Translation Equivalents
Fangzhong Su | Bogdan Babych
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Development and Application of a Cross-language Document Comparability Metric
Fangzhong Su | Bogdan Babych
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present a metric that measures comparability of documents across different languages. The metric is developed within the FP7 ICT ACCURAT project, as a tool for aligning comparable corpora on the document level; further these aligned comparable documents are used for phrase alignment and extraction of translation equivalents, with the aim to extend phrase tables of statistical MT systems without the need to use parallel texts. The metric uses several features, such as lexical information, document structure, keywords and named entities, which are combined in an ensemble manner. We present the results by measuring the reliability and effectiveness of the metric, and demonstrate its application and the impact for the task of parallel phrase extraction from comparable corpora.

pdf bib
Collecting and Using Comparable Corpora for Statistical Machine Translation
Inguna Skadiņa | Ahmet Aker | Nikos Mastropavlos | Fangzhong Su | Dan Tufis | Mateja Verlic | Andrejs Vasiļjevs | Bogdan Babych | Paul Clough | Robert Gaizauskas | Nikos Glaros | Monica Lestari Paramita | Mārcis Pinnis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Lack of sufficient parallel data for many languages and domains is currently one of the major obstacles to further advancement of automated translation. The ACCURAT project is addressing this issue by researching methods how to improve machine translation systems by using comparable corpora. In this paper we present tools and techniques developed in the ACCURAT project that allow additional data needed for statistical machine translation to be extracted from comparable corpora. We present methods and tools for acquisition of comparable corpora from the Web and other sources, for evaluation of the comparability of collected corpora, for multi-level alignment of comparable corpora and for extraction of lexical and terminological data for machine translation. Finally, we present initial evaluation results on the utility of collected corpora in domain-adapted machine translation and real-life applications.

2010

pdf bib
Word Sense Subjectivity for Cross-lingual Lexical Substitution
Fangzhong Su | Katja Markert
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts
Fangzhong Su | Katja Markert
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Eliciting Subjectivity and Polarity Judgements on Word Senses
Fangzhong Su | Katja Markert
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics

pdf bib
From Words to Senses: A Case Study of Subjectivity Recognition
Fangzhong Su | Katja Markert
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)