Savitha Ramasamy


pdf bib
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Truong Giang Do | Le Khiem | Quang Pham | TrungTin Nguyen | Thanh-Nam Doan | Binh Nguyen | Chenghao Liu | Savitha Ramasamy | Xiaoli Li | Steven Hoi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces HyperRouter, which dynamically generates the router’s parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of HyperRouter compared to existing routing methods. Our implementation is publicly available at


pdf bib
Refinement Matters: Textual Description Needs to be Refined for Zero-shot Learning
Chandan Gautam | Sethupathy Parameswaran | Vinay Verma | Suresh Sundaram | Savitha Ramasamy
Findings of the Association for Computational Linguistics: EMNLP 2022

Zero-Shot Learning (ZSL) has shown great promise at the intersection of vision and language, and generative methods for ZSL are predominant owing to their efficiency. Moreover, textual description or attribute plays a critical role in transferring knowledge from the seen to unseen classes in ZSL. Such generative approaches for ZSL are very costly to train and require the class description of the unseen classes during training. In this work, we propose a non-generative gating-based attribute refinement network for ZSL, which achieves similar accuracies to generative methods of ZSL, at a much lower computational cost. The refined attributes are mapped into the visual domain through an attribute embedder, and the whole network is guided by the circle loss and the well-known softmax cross-entropy loss to obtain a robust class embedding. We refer to our approach as Circle loss guided gating-based Attribute-Refinement Network (CARNet). We perform extensive experiments on the five benchmark datasets over the various challenging scenarios viz., Generalized ZSL (GZSL), Continual GZSL (CGZSL), and conventional ZSL. We observe that the CARNet significantly outperforms recent non-generative ZSL methods and most generative ZSL methods in all three settings by a significant margin. Our extensive ablation study disentangles the performance of various components and justifies their importance. The source code is available at


pdf bib
Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring
Zhengyuan Liu | Hazel Lim | Nur Farah Ain Suhaimi | Shao Chuen Tong | Sharon Ong | Angela Ng | Sheldon Lee | Michael R. Macdonald | Savitha Ramasamy | Pavitra Krishnaswamy | Wai Leng Chow | Nancy F. Chen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations.