2023
pdf
bib
abs
Development of Urdu-English Religious Domain Parallel Corpus
Sadaf Abdul Rauf
|
Noor e Hira
Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
Despite the abundance of monolingual corpora accessible online, there remains a scarcity of domain specific parallel corpora. This scarcity poses a challenge in the development of robust translation systems tailored for such specialized domains. Addressing this gap, we have developed a parallel religious domain corpus for Urdu-English. This corpus consists of 18,426 parallel sentences from Sunan Daud, carefully curated to capture the unique linguistic and contextual aspects of religious texts. The developed corpus is then used to train Urdu-English religious domain Neural Machine Translation (NMT) systems, the best system scored 27.9 BLEU points
2020
pdf
bib
abs
Improving Document-Level Neural Machine Translation with Domain Adaptation
Sami Ul Haq
|
Sadaf Abdul Rauf
|
Arslan Shoukat
|
Noor-e- Hira
Proceedings of the Fourth Workshop on Neural Generation and Translation
Recent studies have shown that translation quality of NMT systems can be improved by providing document-level contextual information. In general sentence-based NMT models are extended to capture contextual information from large-scale document-level corpora which are difficult to acquire. Domain adaptation on the other hand promises adapting components of already developed systems by exploiting limited in-domain data. This paper presents FJWU’s system submission at WNGT, we specifically participated in Document level MT task for German-English translation. Our system is based on context-aware Transformer model developed on top of original NMT architecture by integrating contextual information using attention networks. Our experimental results show providing previous sentences as context significantly improves the BLEU score as compared to a strong NMT baseline. We also studied the impact of domain adaptation on document level translationand were able to improve results by adaptingthe systems according to the testing domain.
pdf
bib
abs
FJWU participation for the WMT20 Biomedical Translation Task
Sumbal Naz
|
Sadaf Abdul Rauf
|
Noor-e- Hira
|
Sami Ul Haq
Proceedings of the Fifth Conference on Machine Translation
This paper reports system descriptions for FJWU-NRPU team for participation in the WMT20 Biomedical shared translation task. We focused our submission on exploring the effects of adding in-domain corpora extracted from various out-of-domain sources. Systems were built for French to English using in-domain corpora through fine tuning and selective data training. We further explored BERT based models specifically with focus on effect of domain adaptive subword units.
pdf
bib
abs
On the Exploration of English to Urdu Machine Translation
Sadaf Abdul Rauf
|
Syeda Abida
|
Noor-e- Hira
|
Syeda Zahra
|
Dania Parvez
|
Javeria Bashir
|
Qurat-ul-ain Majid
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Machine Translation is the inevitable technology to reduce communication barriers in today’s world. It has made substantial progress in recent years and is being widely used in commercial as well as non-profit sectors. Such is only the case for European and other high resource languages. For English-Urdu language pair, the technology is in its infancy stage due to scarcity of resources. Present research is an important milestone in English-Urdu machine translation, as we present results for four major domains including Biomedical, Religious, Technological and General using Statistical and Neural Machine Translation. We performed series of experiments in attempts to optimize the performance of each system and also to study the impact of data sources on the systems. Finally, we established a comparison of the data sources and the effect of language model size on statistical machine translation performance.
2019
pdf
bib
abs
Exploring Transfer Learning and Domain Data Selection for the Biomedical Translation
Noor-e- Hira
|
Sadaf Abdul Rauf
|
Kiran Kiani
|
Ammara Zafar
|
Raheel Nawaz
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Transfer Learning and Selective data training are two of the many approaches being extensively investigated to improve the quality of Neural Machine Translation systems. This paper presents a series of experiments by applying transfer learning and selective data training for participation in the Bio-medical shared task of WMT19. We have used Information Retrieval to selectively choose related sentences from out-of-domain data and used them as additional training data using transfer learning. We also report the effect of tokenization on translation model performance.