Baskaran Sankaran

Also published as: Sankaran Baskaran


2016

pdf bib
Zero-Resource Translation with Multi-Lingual Neural Machine Translation
Orhan Firat | Baskaran Sankaran | Yaser Al-onaizan | Fatos T. Yarman Vural | Kyunghyun Cho
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Coverage Embedding Models for Neural Machine Translation
Haitao Mi | Baskaran Sankaran | Zhiguo Wang | Abe Ittycheriah
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Bayesian iterative-cascade framework for hierarchical phrase-based translation
Baskaran Sankaran | Anoop Sarkar
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

The typical training of a hierarchical phrase-based machine translation involves a pipeline of multiple steps where mistakes in early steps of the pipeline are propagated without any scope for rectifying them. Additionally the alignments are trained independent of and without being informed of the end goal and hence are not optimized for translation. We introduce a novel Bayesian iterative-cascade framework for training Hiero-style model that learns the alignments together with the synchronous translation grammar in an iterative setting. Our framework addresses the above mentioned issues and provides an elegant and principled alternative to the existing training pipeline. Based on the validation experiments involving two language pairs, our proposed iterative-cascade framework shows consistent gains over the traditional training pipeline for hierarchical translation.

2013

pdf bib
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering
Maryam Siahbani | Baskaran Sankaran | Anoop Sarkar
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Metric Optimization Using Ensemble Tuning
Baskaran Sankaran | Anoop Sarkar | Kevin Duh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Scalable Variational Inference for Extracting Hierarchical Phrase-based Translation Rules
Baskaran Sankaran | Gholamreza Haffari | Anoop Sarkar
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Incremental Segmentation and Decoding Strategies for Simultaneous Translation
Mahsa Yarmohammadi | Vivek Kumar Rangarajan Sridhar | Srinivas Bangalore | Baskaran Sankaran
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation
Baskaran Sankaran | Anoop Sarkar
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Mixing Multiple Translation Models in Statistical Machine Translation
Majid Razmara | George Foster | Baskaran Sankaran | Anoop Sarkar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Kriya - The SFU System for Translation Task at WMT-12
Majid Razmara | Baskaran Sankaran | Ann Clifton | Anoop Sarkar
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Compact Rule Extraction for Hierarchical Phrase-based Translation
Baskaran Sankaran | Gholamreza Haffari | Anoop Sarkar
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper introduces two novel approaches for extracting compact grammars for hierarchical phrase-based translation. The first is a combinatorial optimization approach and the second is a Bayesian model over Hiero grammars using Variational Bayes for inference. In contrast to the conventional Hiero (Chiang, 2007) rule extraction algorithm , our methods extract compact models reducing model size by 17.8% to 57.6% without impacting translation quality across several language pairs. The Bayesian model is particularly effective for resource-poor languages with evidence from Korean-English translation. To our knowledge, this is the first alternative to Hiero-style rule extraction that finds a more compact synchronous grammar without hurting translation performance.

2011

pdf bib
Bayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation
Baskaran Sankaran | Gholamreza Haffari | Anoop Sarkar
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
Incremental Decoding for Phrase-Based Statistical Machine Translation
Baskaran Sankaran | Ajeet Grewal | Anoop Sarkar
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2008

pdf bib
Designing a Common POS-Tagset Framework for Indian Languages
Sankaran Baskaran | Kalika Bali | Tanmoy Bhattacharya | Pushpak Bhattacharyya | Girish Nath Jha | Rajendran S | Saravanan K | Sobha L | Subbarao K V.
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
A Common Parts-of-Speech Tagset Framework for Indian Languages
Baskaran Sankaran | Kalika Bali | Monojit Choudhury | Tanmoy Bhattacharya | Pushpak Bhattacharyya | Girish Nath Jha | S. Rajendran | K. Saravanan | L. Sobha | K.V. Subbarao
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a universal Parts-of-Speech (POS) tagset framework covering most of the Indian languages (ILs) following the hierarchical and decomposable tagset schema. In spite of significant number of speakers, there is no workable POS tagset and tagger for most ILs, which serve as fundamental building blocks for NLP research. Existing IL POS tagsets are often designed for a specific language; the few that have been designed for multiple languages cover only shallow linguistic features ignoring linguistic richness and the idiosyncrasies. The new framework that is proposed here addresses these deficiencies in an efficient and principled manner. We follow a hierarchical schema similar to that of EAGLES and this enables the framework to be flexible enough to capture rich features of a language/ language family, even while capturing the shared linguistic structures in a methodical way. The proposed common framework further facilitates the sharing and reusability of scarce resources in these languages and ensures cross-linguistic compatibility.