Siddharth Jain

Also published as: Siddhanth Jain


2021

pdf bib
A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages
Anoop Kunchukuttan | Siddharth Jain | Rahul Kejriwal
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages. We create a corpus of 600K word pairs mined from parallel translation corpora and monolingual corpora, which is the largest transliteration corpora for Indian languages mined from public sources. We perform a detailed analysis of multilingual transliteration and propose an improved multilingual training recipe for Indic languages. We analyze various factors affecting transliteration quality like language family, transliteration direction and word origin.

2020

pdf bib
Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20
Vikrant Goyal | Anoop Kunchukuttan | Rahul Kejriwal | Siddharth Jain | Amit Bhagwat
Proceedings of the Fifth Conference on Machine Translation

We describe our submission for the English→Tamil and Tamil→English news translation shared task. In this submission, we focus on exploring if a low-resource language (Tamil) can benefit from a high-resource language (Hindi) with which it shares contact relatedness. We show utilizing contact relatedness via multilingual NMT can significantly improve translation quality for English-Tamil translation.

2016

pdf bib
Identifying Sensible Participants in Online Discussions
Siddharth Jain
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media

2014

pdf bib
A Corpus of Participant Roles in Contentious Discussions
Siddharth Jain | Archna Bhatia | Angelique Rein | Eduard Hovy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The expansion of social roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks though social media. Users in online social environment play a variety of social roles. The concept of “social role” has long been used in social science describe the intersection of behavioural, meaningful, and structural attributes that emerge regularly in particular settings. In this paper, we present a new corpus for social roles in online contentious discussions. We explore various behavioural attributes such as stubbornness, sensibility, influence, and ignorance to create a model of social roles to distinguish among various social roles participants assume in such setup. We annotate discussions drawn from two different sets of corpora in order to ensure that our model of social roles and their signals hold up in general. We discuss the various criteria for deciding values for each behavioural attributes which define the roles.

2013

pdf bib
Joint Bootstrapping of Corpus Annotations and Entity Types
Hrushikesh Mohapatra | Siddhanth Jain | Soumen Chakrabarti
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing