DENTRA: Denoising and Translation Pre-training for Multilingual Machine Translation
Samta Kamboj | Sunil Kumar Sahu | Neha Sengupta
Proceedings of the Seventh Conference on Machine Translation (WMT)
In this paper, we describe our submission to the WMT-2022: Large-Scale Machine Translation Evaluation for African Languages under the Constrained Translation track. We introduce DENTRA, a novel pre-training strategy for a multilingual sequence-to-sequence transformer model. DENTRA pre-training combines denoising and translation objectives to incorporate both monolingual and bitext corpora in 24 African, English, and French languages. To evaluate the quality of DENTRA, we fine-tuned it with two multilingual machine translation configurations, one-to-many and many-to-one. In both pre-training and fine-tuning, we employ only the datasets provided by the organizers. We compare DENTRA against a strong baseline, M2M-100, in different African multilingual machine translation scenarios and show gains in 3 out of 4 subtasks.
Autoencoding Keyword Correlation Graph for Document Clustering
Billy Chiu | Sunil Kumar Sahu | Derek Thomas | Neha Sengupta | Mohammady Mahdy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Document clustering requires a deep understanding of the complex structure of long-text; in particular, the intra-sentential (local) and inter-sentential features (global). Existing representation learning models do not fully capture these features. To address this, we present a novel graph-based representation for document clustering that builds a graph autoencoder (GAE) on a Keyword Correlation Graph. The graph is constructed with topical keywords as nodes and multiple local and global features as edges. A GAE is employed to aggregate the two sets of features by learning a latent representation which can jointly reconstruct them. Clustering is then performed on the learned representations, using vector dimensions as features for inducing document classes. Extensive experiments on two datasets show that the features learned by our approach can achieve better clustering performance than other existing features, including term frequency-inverse document frequency and average embedding.