Vishwa Shah


2024

pdf bib
AdaPT: A Set of Guidelines for Hyperbolic Multimodal Multilingual NLP
Ramit Sawhney | Shrey Pandit | Vishwa Shah | Megh Thakkar | Shafiq Joty
Findings of the Association for Computational Linguistics: NAACL 2024

The Euclidean space is the familiar space for training neural models and performing arithmetic operations.However, many data types inherently possess complex geometries, and model training methods involve operating over their latent representations, which cannot be effectively captured in the Euclidean space.The hyperbolic space provides a more generalized representative geometry to model the hierarchical complexities of the tree-like structure of natural language.We propose AdaPT a set of guidelines for initialization, parametrization, and training of neural networks, which adapts to the dataset and can be used with different manifolds. AdaPT can be generalized over any existing neural network training methodology and leads to more stable training without a substantial increase in training time.We apply AdaPT guidelines over two state-of-the-art deep learning approaches and empirically demonstrate its effectiveness through experiments on three tasks over 12 languages across speech and text.Through extensive qualitative analysis, we put forward the applicability of AdaPT as a set of guidelines optimally utilizing the manifold geometry, which can be extended to various downstream tasks across languages and modalities.

pdf bib
Calc-CMU at SemEval-2024 Task 7: Pre-Calc - Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath | Vishwa Shah | Kshitish Ghate
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Quantitative and numerical comprehension in language is an important task in many fields like education and finance, but still remains a challenging task for language models. While tool and calculator usage has shown to be helpful to improve mathematical reasoning in large pretrained decoder-only language models, this remains unexplored for smaller language models with encoders. In this paper, we propose Pre-Calc, a simple pre-finetuning objective of learning to use the calculator for both encoder-only and encoder-decoder architectures, formulated as a discriminative and generative task respectively. We pre-train BERT and RoBERTa for discriminative calculator use and Flan-T5 for generative calculator use on the MAWPS, SVAMP, and AsDiv-A datasets, which improves performance on downstream tasks that require numerical understanding. Our code and data are available at https://github.com/calc-cmu/pre-calc.