Martin Kay

2014

Does a Computational Linguist have to be a Linguist?
Martin Kay
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2012

pdf bib

The new machine translation: getting blood from a stone
Martin Kay
Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib

Proceedings of COLING 2012
Martin Kay | Christian Boitet
Proceedings of COLING 2012

pdf bib

Proceedings of COLING 2012: Posters
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Posters

pdf bib

Proceedings of COLING 2012: Demonstration Papers
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Demonstration Papers

pdf bib abs

Suffix Trees as Language Models
Casey Redd Kennington | Martin Kay | Annemarie Friedrich
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how some properties of suffix trees naturally provide the functionality of an n-gram language model with variable n. We explain these properties of suffix trees, which we leverage for our Suffix Tree Language Model (STLM) implementation and explain how a suffix tree implicitly contains the data needed for n-gram language modeling. We also discuss the kinds of smoothing techniques appropriate to such a model. We then show that our suffix-tree language model implementation is competitive when compared to the state-of-the-art language model SRILM (Stolke, 2002) in statistical machine translation experiments.

2009

pdf bib

Intersecting Multilingual Data for Faster and Better Statistical Translations
Yu Chen | Martin Kay | Andreas Eisele
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib

pdf bib abs

Improving Statistical Machine Translation Efficiency by Triangulation
Yu Chen | Andreas Eisele | Martin Kay
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In current phrase-based Statistical Machine Translation systems, more training data is generally better than less. However, a larger data set eventually introduces a larger model that enlarges the search space for the decoder, and consequently requires more time and more resources to translate. This paper describes an attempt to reduce the model size by filtering out the less probable entries based on testing correlation using additional training data in an intermediate third language. The central idea behind the approach is triangulation, the process of incorporating multilingual knowledge in a single system, which eventually utilizes parallel corpora available in more than two languages. We conducted experiments using Europarl corpus to evaluate our approach. The reduction of the model size can be up to 70% while the translation quality is being preserved.

If chart parsing is taken to include the process of reading out solutions one by one, then it has exponential complexity. The stratagem of separating read-out from chart construction can also be applied to other kinds of parser, in particular, to left-comer parsers that use early composition. When a limit is placed on the size of the stack in such a parser, it becomes context-free equivalent. However, it is not practical to profit directly from this observation because of the large state sets that are involved in otherwise ordinary situations. It may be possible to overcome these problems by means of a guide constructed from a weakened version of the initial grammar.

1999

pdf bib abs

Chart translation
Martin Kay
Proceedings of Machine Translation Summit VII

For efficiency reasons, Machine Translation systems are generally designed to eliminate ambiguities as early as possible even if delaying the decision would make a more informed choice possible. This paper takes the contrary view, arguing that essentially all choices should be deferred so that large numbers of competing translations will be produced in typical cases. Representing all the data structures in a suitable packed form, much as alternative structures are represented in a chart parser, makes this practicable.