Why Find the Right One?

The present paper investigates the impact of the anaphoric one words in English on the Neural Machine Translation (NMT) process using English-Hindi as source and target language pair. As expected, the experimental results show that the state-of-the-art Google English-Hindi NMT system achieves significantly poorly on sentences containing anaphoric ones as compared to the sentences containing regular, non-anaphoric ones. But, more importantly, we note that amongst the anaphoric words, the noun class is clearly much harder for NMT than the determinatives. This reaffirms the linguistic disparity of the two phenomenon in recent theoretical syntactic literature, despite the obvious surface similarities.


Introduction
English has three distinct lexemes spelled as onethe regular third person indefinite pronoun, such as in (1); the indefinite cardinal numeral (determinative), such as in (2); and regular common count noun, such as in (3). 1. One must obey the laws of the state at all times.
2. Could you pass me one one glass of water here.
3. It is important that we take care of our loved ones.
A visible difference in their orthographic base form is not observable. However, these can be totally differentiated on the basis of their morphological, syntactic, and semantic functions in the language. Note that the examples presented in (1), (2) and (3) are non-anaphoric one words. Coming to the anaphoric class of onewe have two subtypes. The first one belongs to the determinative category, as seen in (4); and the second one is a noun, as in (5). 4. I bought three red glasses, but she bought only one.
5. After looking at all the glasses, I decided to buy this small one.
As expected, the determinative anaphoric ones behave like a determiner, and the one-anaphora behave like nouns in a sentence. Note that the plural form of the determinative one in example (4) is some, but that of one-anaphora in (5) is ones. They are also different with respect to the kind of antecedents they take. The constituent whose repetition the determinative anaphora avoids is the whole NP, a glass. But in case of one-anaphora, it is the noun head optionally with one or more of its modifiers red glass, but never the whole NP (Payne et al., 2013).
Like other cohesive devices like pronouns and ellipsis, anaphoric ones make language less redundant and more engaging (Menzel, 2017;Mitkov, 1999;Halliday and Hasan, 1976). Resolving the information encoded in such structures is not hard for humans as they can easily disambiguate meanings from linguistic or extralinguistic context, cognitive commonsense extension as well as logical reasoning (Chen, 2016). However, all of this is not that straightforward for a machine. In fact, anaphoric ones can potentially present a special challenge for Machine Translation (MT) since the meaning of the word does not come from its most frequent usage as a cardinal number, but instead relies on its context, thereby becoming unavailable overtly at the surface syntax for text processing.
To the best of our knowledge, the earliest computational approach to one-anaphora comes from Gardiner (2003), who presents several linguisticallymotivated heuristics to distinguish one-anaphora from other non-anaphoric uses of one in English, and later from Ng (2005) that uses Gardiner's heuristics as features to train a simple Machine Learning (ML) model. Another seminal work on the anaphoric one comes from Recasens et al. (2016) where it has been treated as one of the several sense anaphoric relations in English. The authors create sAnaNotes corpus where they annotate one third of the OntoNotes corpus for sense Anaphora. They use a Support Vector Machine (SVM) classifier -LIBLINEAR implementation (Fan et al., 2008) along with 31 lexical and syntactic features, to distinguish between the anaphoric and the non-anaphoric class. Trained and tested on one-third of the OntoNotes dataset annotated as the SAnaNotes corpus, their system achieves 61.80% F1 score on the detection of all anaphoric relations, including one-anaphora. The detection and resolution of the determinative one anaphor, on the other hand, has been carried out as a part of computational research on noun ellipsis (Khullar et al., 2020b(Khullar et al., , 2019. Recent research shows that discourse devices such as pronominal anaphora, ellipsis, deixis and lexical cohesion create inconsistencies in MT output (Voita et al., 2019;Mitkov, 2004). Unlike these discourse devices, however, the exact role of anaphoric ones in NLP tasks such as MT has not been studied. In the present paper, we conduct a data-driven study to study this extent and nature of this impact, using English and Hindi as source and target language pairs. 3 Experiment

Curating Test sets
We prepare three test sets-the first containing sentences with determinative anaphoric ones; the second containing one-anaphora; and the third containing regular non-anaphoric one words. For the first test set, we randomly choose 750 sentences from the NoEl corpus (Khullar et al., 2020b), the curated dataset prepared by (Khullar et al., 2019) and the sAnaNotes corpus (Recasens et al., 2016); for the second, we take 750 sentences from (Khullar et al., 2020a) and (Recasens et al., 2016); and for the third, pick 750 sentences each from Cornel movie dialogs dataset (Danescu-Niculescu-Mizil and Lee, 2011) and The British National Corpus (2001), manually checked to contain non-anaphoric ones. We also undertake translation of these 2,250 sentences to assist automatic evaluation. The translation is carried manually by a professional translator, who is bilingual in English and Hindi. We get up to three translations for each sentence, which are then verified by a native Hindi speaker.

Obtaining Translations
To get the English sentences translated into Hindi, we use Google NMT (GNMT). The system comprises a deep LSTM network with 8 encoder and 8 decoder layers with attention and residual connections (Wu et al., 2016). It serves us well for our experiment as its performance is at par with the current state-of-the-art NMT systems and is also freely available for translations between English and Hindi. This system is run on the three test sets and the translations are saved for analysis.

Evaluation
In automatic evaluation, we get a BLEU (Bilingual Evaluation Understudy Score) (Papineni et al., 2002) score of 39.72 for the sentences in the first test set, 38.21 in the second and 41.46 in the third. We also try manual evaluation, where four evalua-tors rate the translations of all sentences from the three test sets for their fluency) or syntactic correctness and (adequacy or translation accuracy. The evaluation of all the metrics is done on a 4-point Likert scale, see Table 1 for reference. The assigned scores by different raters are totalled and averaged for all the given sentences. We use the Fleiss's Kappa coefficient (Fleiss, 1971) to calculate the inter-annotator agreement between multiple evaluators. We get a score of 0.83 for fluency and 0.77 for adequacy that confirms reliability of the evaluation.

Results and Discussion
As can be seen, BLEU is the lower for the sentences containing anaphoric ones as compared to non-anaphoric ones. However, this may not be indicative of a trend as the test set is small and the difference observed is not that huge. Coming to manual evaluation, a total of 389 sentences from the first set, 601 from the second test set and 202 sentences from the third set get a rating of either 1 or 2 in the adequacy evaluation perspective. See Table 2. This shows that a majority of the sentences containing anaphoric one words are either poorly translated or have major translation quality errors, although they are grammatically still acceptable. About 90% of the sentences containing the nonanaphoric instances of one are translated rather well by the system. Most of the errors observed are due to the incorrect translation of named entities and incorrect subject-verb agreement for gender marking. We do not encounter any errors that are caused due to incorrect translation of the word one in the target language. In comparison to the sentences contaning the non-anaphoric one words, the sentences containing anaphoric one words are translated much poorly. Within the latter, we note that the highest number of wrong translations are for the sentences with one-anaphora. The errors observed in such incorrect translations can be categorized into three types. In the first type, the anaphoric one words are translated into non-anaphoric one expressions, specifically as the cardinal numeral, in the target language. For example in Figure 1, the one-anaphora in the English sentence, which means name as seen from preceding context, gets translated as cardinal numeral one in the target language. Out of 750 sentences, a total of 232 sentences exhibit this error. One possible reason for this error could be the most common occurrence of the word one in English as a cardinal number (Gardiner, 2003). Hence, in case of ambiguity, the word one is more likely to be treated as a cardinal number by the MT system. The second type of errors are where the anaphoric one gets translated as a pronoun in the target language. Such errors occur very few times-only 25 from all sentences in our test set. See Figure 2 for one such example. Finally, in the third type of errors, the one-anaphora gets completely disregarded by the translation system and the translated sentence shows no equivalent lexeme to the anaphor. Note that these errors result into poor translation adequacy, but a majority of the translated sentence are more or less grammatically acceptable as per the rules of the target language, as seen in Figure  1 and Figure 2. They can, however, also become  totally absurd in meaning in some cases, as can be seen in Figure 2.
As compared to one-anaphora, the severity of wrong translations for determinative anaphoric ones is slightly less. Hindi is morphologically richer as compared to English. We observe that the error in the translations come from copying of wrong agreement morphology on the verb in the absence of the noun whose repetition the determinative anaphoric one avoids. See Figure 3 for one such example. This also implies that although such sentences get a lower rating for fluency, they rate higher for translation adequacy.
From a long time in traditional syntactic literature, right from Baker (1978), one-anaphora and determinative anaphoric one words have been clubbed together, with frequent interchangeable uses of them in discussions and analysis. It is only recently (Payne et al., 2013) that the morphological, syntactic and semantic differences between the two anaphoric forms have been extensively discussed. Note that although recent work by Kayne (2015) aims to render all instances of the word one a homogeneous internal structure, comprising a classifier merged with an indefintive article through a variety of examples, he too identifies subtypes within this class and points out how they behave differently than one another. Our simple experiment highlights the differences between these two forms, restating their linguistic analysis and advocating for a disparate treatment for them in future Computational Linguistics and NLP research.
Finally, in the sentences that are correctly translated, we observe that a majority of the oneanaphora and the determinative anaphoric ones get translated exactly into their antecedent. This means Figure 3: Translation of an English sentence containing determinative anaphoric one to Hindi. Although the translation is fine, the wrong agreement morphology on the verb makes it grammatically incorrect. that the anaphoric expression per se is lost in the target language. For instance, the corresponding expression for one-anaphora in Hindi is vaala (singular, masculine). We see only 69 out of 750 translated sentences actually containing this lexeme. It is not surprising that 66 out of such sentences are rated 4 in the evaluation.
It is debatable, however, to claim that a translation that contains an anaphoric expression similar to the source is of better quality as compared to the translation that only copies the antecedent and replaces the anaphor with it. While both achieves nearly the same meaning and are grammatically acceptable, in our experiment, the former type were rated higher. It could be, then, argued that the latter added redundant information which might not be desirable in most cases.

Conclusion
In the present paper, we performed a simple experiment to investigate the impact of anaphoric and non-anaphoric one words on Neural Machine Translation process using English and Hindi as source and target language pair. Evaluation by manual methods revealed that anaphoric instances of the word one are much harder to translate as compared to the non-anaphoric one words. We also conclude that within the anaphoric class, oneanaphora are harder to translate than determinative anaphors, which reaffirms the linguistic disparity between the two phenomenon as shown in recent syntactic research. The long term goal of such a study is to improve the quality of translation of discourse structures such as anaphoric ones.