2019
pdf
bib
abs
Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches
Talha Çolakoğlu
|
Umut Sulubacak
|
Ahmet Cüneyd Tantuğ
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
2008
pdf
bib
abs
BLEU+: a Tool for Fine-Grained BLEU Computation
A. Cüneyd Tantuǧ
|
Kemal Oflazer
|
Ilknur Durgar El-Kahlout
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
We present a tool, BLEU+, which implements various extension to BLEU computation to allow for a better understanding of the translation performance, especially for morphologically complex languages. BLEU+ takes into account both closeness in morphological structure, closeness of the root words in the WordNet hierarchy while comparing tokens in the candidate and reference sentence. In addition to gauging performance at a finer level of granularity, BLEU+ also allows the computation of various upper bound oracle scores: comparing all tokens considering only the roots allows us to get an upper bound when all errors due to morphological structure are fixed, while comparing tokens in an error-tolerant way considering minor morpheme edit operations, allows us to get a (more realistic) upper bound when tokens that differ in morpheme insertions/deletions and substitutions are fixed. We use BLEU+ in the fine-grained evaluation of the output of our English-to-Turkish statistical MT system.
2007
pdf
bib
Machine Translation between Turkic Languages
Ahmet Cüneyd Tantuğ
|
Eşref Adali
|
Kemal Oflazer
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
pdf
bib
A MT system from Turkmen to Turkish employing finite state and statistical methods
Ahmet Cüneyd Tantuğ
|
Eşref Adali
|
Kemal Oflazer
Proceedings of Machine Translation Summit XI: Papers