Christer Samuelsson

2012

pdf bib abs
HAL: Challenging Three Key Aspects of IBM-style Statistical Machine Translation
Christer Samuelsson
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

The IBM schemes use weighted cooccurrence counts to iteratively improve translation and alignment probability estimates. We argue that: 1) these cooccurrence counts should be combined differently to capture word correlation; 2) alignment probabilities adopt predictable distributions; and 3) consequently, no iteration is needed. This applies equally well to word-based and phrase-based approaches. The resulting scheme, dubbed HAL, outperforms the IBM scheme in experiments.

An algorithm is presented for tagging input word graphs and producing output tag graphs that are to be subjected to further syntactic processing. It is based on an extension of the basic HMM equations for tagging an input word string that allows it to handle word-graph input, where each arc has been assigned a probability. The scenario is that of some word-graph source, e.g., an acoustic speech recognizer, producing the arcs of a word graph, and the tagger will in turn produce output arcs, labelled with tags and assigned probabilities. The processing as done entirely left-to-right, and the output tag graph is constructed using a minimum of lookahead, facilitating real-time processing.

pdf bib
Comparing a Linguistic and a Stochastic Tagger
Christer Samuelsson | Atro Voutilainen
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
Handling Sparse Data by Successive Abstraction
Christer Samuelsson
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

pdf bib
Relating Turing’s Formula and Zipf’s Law
Christer Samuelsson
Fourth Workshop on Very Large Corpora

1995

pdf bib
Tagging the Teleman Corpus
Thorsten Brants | Christer Samuelsson
Proceedings of the 10th Nordic Conference of Computational Linguistics (NODALIDA 1995)

pdf bib abs
A Novel Framework for Reductionistic Statistical Parsing
Christer Samuelsson
Proceedings of the Fourth International Workshop on Parsing Technologies

A reductionistic statistical framework for part-of-speech tagging and surface syntactic parsing is presented that has the same expressive power as the highly successful Constraint Grammar approach, see [Karlsson et al. 1995]. The structure of the Constraint Grammar rules allows them to be viewed as conditional probabilities that can be used to update the lexical tag probabilities, after which low-probability tags are repeatedly removed. Experiments using strictly conventional information sources on the Susanne and Teleman corpora indicate that the system performs as well as a traditional HMM-based part-of-speech tagger, yielding state-of-the-art results. The scheme also enables using the same information sources as the Constraint Grammar approach, and the hope is that it can improve on the performance of both statistical taggers and surface-syntactic analyzers.