We propose CHRT (Control HiddenRepresentation Transformation) – a con-trolled language generation framework thatsteers large language models to generatetext pertaining to certain attributes (such astoxicity). CHRT gains attribute control bymodifying the hidden representation of thebase model through learned transformations. We employ a contrastive-learning frameworkto learn these transformations that can becombined to gain multi-attribute control. Theeffectiveness of CHRT is experimentallyshown by comparing it with seven baselinesover three attributes. CHRT outperforms all thebaselines in the task of detoxification, positivesentiment steering, and text simplificationwhile minimizing the loss in linguistic qualities. Further, our approach has the lowest inferencelatency of only 0.01 seconds more than thebase model, making it the most suitable forhigh-performance production environments. We open-source our code and release two noveldatasets to further propel controlled languagegeneration research
While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.
Training mixed-domain translation models is a complex task that demands tailored architec- tures and costly data preparation techniques. In this work, we leverage federated learning (FL) in order to tackle the problem. Our investiga- tion demonstrates that with slight modifications in the training process, neural machine trans- lation (NMT) engines can be easily adapted when an FL-based aggregation is applied to fuse different domains. Experimental results also show that engines built via FL are able to perform on par with state-of-the-art baselines that rely on centralized training techniques. We evaluate our hypothesis in the presence of five datasets with different sizes, from different domains, to translate from German into English and discuss how FL and NMT can mutually benefit from each other. In addition to provid- ing benchmarking results on the union of FL and NMT, we also propose a novel technique to dynamically control the communication band- width by selecting impactful parameters during FL updates. This is a significant achievement considering the large size of NMT engines that need to be exchanged between FL parties.