In the present communication-based society, no natural language seems to have been left untouched by the trends of code-mixing. For different communicative purposes, a language uses linguistic codes from other languages. This gives rise to a mixed language which is neither totally the host language nor the foreign language. The mixed language poses a new challenge to the problem of machine translation. It is necessary to identify the “foreign” elements in the source language and process them accordingly. The foreign elements may not appear in their original form and may get morphologically transformed as per the host language. Further, in a complex sentence, a clause/utterance may be in the host language while another clause/utterance may be in the foreign language. Code-mixing of Hindi and English where Hindi is the host language, is a common phenomenon in day-to-day language usage in Indian metropolis. The scenario is so common that people have started considering this a different variety altogether and calling it by the name Hinglish. In this paper, we present a mechanism for machine translation of Hinglish to pure (standard) Hindi and pure English forms.
The issue of translation divergence is an important research topic in the area of machine translation. An exhaustive study of the divergence issues in MT is necessary for their proper classification and resolution. In the literature on MT, scholars have examined the issue and have proposed ways for their classification and resolution (Dorr 1993, 1994). However, the topic still needs further exploration to identify different sources of translation divergence in different pairs of translation languages. In this paper, we discuss translation patterns between Hindi and English of different types of constructions with a view to identifying the potential topics of the translation divergences. We take Dorr’s (1993, 1994) classification of translation divergence as the base to examine the different topics of translation divergence in Hindi and English. The primary goal of the paper is to point out different types of translation divergences in Hindi and English MT that have not been discussed in the existing literature.
ki is an indeclinable element (particle) in Hindi which is used in multiple roles that have multiple mapping patterns in English. In one of its uses, ki functions as a clause complementizer and is mapped usually by that in declarative clauses and by various wh-words (such as what, why, where, how, etc.) in interrogative clauses. The contexts of these mappings are dependent on syntactic-semantic types of the clause. In its non-complementizer use, ki is used to denote various other functions such as coordinate conjunction, purpose and reason clause conjunction, yes-no question particle, etc. It is a difficult task to identify the different uses of ki and determine its multiple mapping patterns in the context of Hindi-English machine translation. A detailed linguistic analysis is needed to disambiguate the different contexts of ki in Hindi. In this paper, we examine the multiple uses and patterns of ki in Hindi and propose strategies for their identification and disambiguation for Hindi-English MT.