Tejaswini Deoskar


2021

pdf bib
Clustering Monolingual Vocabularies to Improve Cross-Lingual Generalization
Riccardo Bassani | Anders Søgaard | Tejaswini Deoskar
Proceedings of the 1st Workshop on Multilingual Representation Learning

Multilingual language models exhibit better performance for some languages than for others (Singh et al., 2019), and many languages do not seem to benefit from multilingual sharing at all, presumably as a result of poor multilingual segmentation (Pyysal o et al., 2020). This work explores the idea of learning multilingual language models based on clustering of monolingual segments. We show significant improvements over standard multilingual segmentation and training across nine languages on a question answering task, both in a small model regime and for a model of the size of BERT-base.

2020

pdf bib
Geo-Aware Image Caption Generation
Sofia Nikiforova | Tejaswini Deoskar | Denis Paperno | Yoad Winter
Proceedings of the 28th International Conference on Computational Linguistics

Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system’s ability to produce captions with relevant and factually accurate geographic referencing.

2019

pdf bib
Constructive Type-Logical Supertagging With Self-Attention Networks
Konstantinos Kogkalidis | Michael Moortgat | Tejaswini Deoskar
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar’s type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.

2016

pdf bib
Shift-Reduce CCG Parsing using Neural Network Models
Bharat Ram Ambati | Tejaswini Deoskar | Mark Steedman
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
An Incremental Algorithm for Transition-based CCG Parsing
Bharat Ram Ambati | Tejaswini Deoskar | Mark Johnson | Mark Steedman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Generalizing a Strongly Lexicalized Parser using Unlabeled Data
Tejaswini Deoskar | Christos Christodoulopoulos | Alexandra Birch | Mark Steedman
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Improving Dependency Parsers using Combinatory Categorial Grammar
Bharat Ram Ambati | Tejaswini Deoskar | Mark Steedman
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Using CCG categories to improve Hindi dependency parsing
Bharat Ram Ambati | Tejaswini Deoskar | Mark Steedman
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Learning Structural Dependencies of Words in the Zipfian Tail
Tejaswini Deoskar | Markos Mylonakis | Khalil Sima’an
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Simple Semi-Supervised Learning for Prepositional Phrase Attachment
Gregory F. Coppola | Alexandra Birch | Tejaswini Deoskar | Mark Steedman
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Hal Daumé III | Tejaswini Deoskar | David McClosky | Barbara Plank | Jörg Tiedemann
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

2009

pdf bib
Smoothing fine-grained PCFG lexicons
Tejaswini Deoskar | Mats Rooth | Khalil Sima’an
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib
Re-estimation of Lexical Parameters for Treebank PCFGs
Tejaswini Deoskar
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Induction of Treebank-Aligned Lexical Resources
Tejaswini Deoskar | Mats Rooth
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe the induction of lexical resources from unannotated corpora that are aligned with treebank grammars, providing a systematic correspondence between features in the lexical resource and a treebank syntactic resource. We first describe a methodology based on parsing technology for augmenting a treebank database with linguistic features. A PCFG containing these features is created from the augmented treebank. We then use a procedure based on the inside-outside algorithm to learn lexical resources aligned with the treebank PCFG from large unannotated corpora. The method has been applied in creating a feature-annotated English treebank based on the Penn Treebank. The unsupervised estimation procedure gives a substantial error reduction (up to 31.6%) on the task of learning the subcategorization preference of novel verbs that are not present in the annotated training sample.