Rajesh Bhatt

UMass Amherst

Other people with similar names: Rajesh Bhat


2024

pdf bib
GEE! Grammar Error Explanation with Large Language Models
Yixiao Song | Kalpesh Krishna | Rajesh Bhatt | Kevin Gimpel | Mohit Iyyer
Findings of the Association for Computational Linguistics: NAACL 2024

Existing grammatical error correction tools do not provide natural language explanations of the errors that they correct in user-written text. However, such explanations are essential for helping users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006).To address this gap, we propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences. The task is not easily solved by prompting LLMs: we find that, using one-shot prompting, GPT-4 only explains 40.6% of the errors and does not even attempt to explain 39.8% of the errors.Since LLMs struggle to identify grammar errors, we develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction, followed by prompting GPT-4 to explain each edit. We evaluate our pipeline on German, Chinese, and English grammar error correction data. Our atomic edit extraction achieves an F1 of 0.93 on German, 0.91 on Chinese, and 0.891 on English. Human evaluation of generated explanations reveals that 93.9% of German errors, 96.4% of Chinese errors, and 92.20% of English errors are correctly detected and explained. To encourage further research, we open-source our data and code.

2022

pdf bib
SLING: Sino Linguistic Evaluation of Large Language Models
Yixiao Song | Kalpesh Krishna | Rajesh Bhatt | Mohit Iyyer
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs. The keys is lost), and an LM should assign lower perplexity to the acceptable sentence. In contrast to the CLiMP dataset (Xiang et al., 2021), which also contains Chinese minimal pairs and was created by translating the vocabulary of the English BLiMP dataset, the minimal pairs in SLING are derived primarily by applying syntactic and lexical transformations to naturally-occurring, linguist-annotated sentences from the Chinese Treebank 9.0, thus addressing severe issues in CLiMP’s data generation process. We test 18 publicly available pretrained monolingual (e.g., BERT-base-zh, CPM) and multi-lingual (e.g., mT5, XLM) language models on SLING. Our experiments show that the average accuracy for LMs is far below human performance (69.7% vs. 97.1%), while BERT-base-zh achieves the highest accuracy (84.8%) of all tested LMs, even much larger ones. Additionally, we find that most LMs have a strong gender and number (singular/plural) bias, and they perform better on local phenomena than hierarchical ones.

2013

pdf bib
Towards a Psycholinguistically Motivated Dependency Grammar for Hindi
Samar Husain | Rajesh Bhatt | Shravan Vasishth
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
Creating a Tree Adjoining Grammar from a Multilayer Treebank
Rajesh Bhatt | Owen Rambow | Fei Xia
Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11)

2011

pdf bib
Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks
Rajesh Bhatt | Owen Rambow | Fei Xia
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Empty Categories in a Hindi Treebank
Archna Bhatia | Rajesh Bhatt | Bhuvana Narasimhan | Martha Palmer | Owen Rambow | Dipti Misra Sharma | Michael Tepper | Ashwini Vaidya | Fei Xia
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make use of ECs. We make a high-level distinction between two types of ECs, trace and silent, on the basis of whether they are postulated to mark displacement or not. Each type is further refined into several subtypes based on the underlying linguistic phenomena which the ECs are introduced to handle. This paper discusses the stages at which we add ECs to the Hindi/Urdu treebank and why. We investigate methodically the different types of ECs and their role in our syntactic and semantic representations. We also examine our decisions whether or not to coindex each type of ECs with other elements in the representation.

2009

pdf bib
A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu
Rajesh Bhatt | Bhuvana Narasimhan | Martha Palmer | Owen Rambow | Dipti Sharma | Fei Xia
Proceedings of the Third Linguistic Annotation Workshop (LAW III)