Samar Husain


2022

pdf bib
Assessing Corpus Evidence for Formal and Psycholinguistic Constraints on Nonprojectivity
Himanshu Yadav | Samar Husain | Richard Futrell
Computational Linguistics, Volume 48, Issue 2 - June 2022

Formal constraints on crossing dependencies have played a large role in research on the formal complexity of natural language grammars and parsing. Here we ask whether the apparent evidence for constraints on crossing dependencies in treebanks might arise because of independent constraints on trees, such as low arity and dependency length minimization. We address this question using two sets of experiments. In Experiment 1, we compare the distribution of formal properties of crossing dependencies, such as gap degree, between real trees and baseline trees matched for rate of crossing dependencies and various other properties. In Experiment 2, we model whether two dependencies cross, given certain psycholinguistic properties of the dependencies. We find surprisingly weak evidence for constraints originating from the mild context-sensitivity literature (gap degree and well-nestedness) beyond what can be explained by constraints on rate of crossing dependencies, topological properties of the trees, and dependency length. However, measures that have emerged from the parsing literature (e.g., edge degree, end-point crossings, and heads’ depth difference) differ strongly between real and random trees. Modeling results show that cognitive metrics relating to information locality and working-memory limitations affect whether two dependencies cross or not, but they do not fully explain the distribution of crossing dependencies in natural languages. Together these results suggest that crossing constraints are better characterized by processing pressures than by mildly context-sensitive constraints.

2021

pdf bib
Clause Final Verb Prediction in Hindi: Evidence for Noisy Channel Model of Communication
Kartik Sharma | Niyati Bafna | Samar Husain
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Verbal prediction has been shown to be critical during online comprehension of Subject-Object-Verb (SOV) languages. In this work we present three computational models to predict clause final verbs in Hindi given its prior arguments. The models differ in their use of prior context during the prediction process – the context is either noisy or noise-free. Model predictions are compared with the sentence completion data obtained from Hindi native speakers. Results show that models that assume noisy context outperform the noise-free model. In particular, a lossy context model that assumes prior context to be affected by predictability and recency captures the distribution of the predicted verb class and error sources best. The success of the predictability-recency lossy context model is consistent with the noisy channel hypothesis for sentence comprehension and supports the idea that the reconstruction of the context during prediction is driven by prior linguistic exposure. These results also shed light on the nature of the noise that affects the reconstruction process. Overall the results pose a challenge to the adaptability hypothesis that assumes use of noise-free preverbal context for robust verbal prediction.

2020

pdf bib
What Determines the Order of Verbal Dependents in Hindi? Effects of Efficiency in Comprehension and Production
Kartik Sharma | Richard Futrell | Samar Husain
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Word order flexibility is one of the distinctive features of SOV languages. In this work, we investigate whether the order and relative distance of preverbal dependents in Hindi, an SOV language, is affected by factors motivated by efficiency considerations during comprehension/production. We investigate the influence of Head–Dependent Mutual Information (HDMI), similarity-based interference, accessibility and case-marking. Results show that preverbal dependents remain close to the verbal head when the HDMI between the verb and its dependent is high. This demonstrates the influence of locality constraints on dependency distance and word order in an SOV language. Additionally, dependency distance were found to be longer when the dependent was animate, when it was case-marked and when it was semantically similar to other preverbal dependents. Together the results highlight the crosslinguistic generalizability of these factors and provide evidence for a functionally motivated account of word order in SOV languages such as Hindi.

2019

pdf bib
Are formal restrictions on crossing dependencies epiphenominal?
Himanshu Yadav | Samar Husain | Richard Futrell
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Can Greenbergian universals be induced from language networks?
Kartik Sharma | Kaivalya Swami | Aditya Shete | Samar Husain
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2017

pdf bib
Understanding Constraints on Non-Projectivity Using Novel Measures
Himanshu Yadav | Ashwini Vaidya | Samar Husain
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
Quantifying sentence complexity based on eye-tracking measures
Abhinav Deep Singh | Poojan Mehta | Samar Husain | Rajkumar Rajakrishnan
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Eye-tracking reading times have been attested to reflect cognitive processes underlying sentence comprehension. However, the use of reading times in NLP applications is an underexplored area of research. In this initial work we build an automatic system to assess sentence complexity using automatically predicted eye-tracking reading time measures and demonstrate the efficacy of these reading times for a well known NLP task, namely, readability assessment. We use a machine learning model and a set of features known to be significant predictors of reading times in order to learn per-word reading times from a corpus of English text having reading times of human readers. Subsequently, we use the model to predict reading times for novel text in the context of the aforementioned task. A model based only on reading times gave competitive results compared to the systems that use extensive syntactic features to compute linguistic complexity. Our work, to the best of our knowledge, is the first study to show that automatically predicted reading times can successfully model the difficulty of a text and can be deployed in practical text processing applications.

2015

pdf bib
Non-projectivity and processing constraints: Insights from Hindi
Samar Husain | Shravan Vasishth
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

2013

pdf bib
Towards a Psycholinguistically Motivated Dependency Grammar for Hindi
Samar Husain | Rajesh Bhatt | Shravan Vasishth
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank
Prudhvi Kosaraju | Bharat Ram Ambati | Samar Husain | Dipti Misra Sharma | Rajeev Sangal
Proceedings of the Sixth Linguistic Annotation Workshop

2011

pdf bib
Clausal parsing helps data-driven dependency parsing: Experiments with Hindi
Samar Husain | Phani Gadde | Joakim Nivre | Rajeev Sangal
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Empty Categories in Hindi Dependency Treebank: Analysis and Recovery
Chaitanya GSK | Samar Husain | Prashanth Mannem
Proceedings of the 5th Linguistic Annotation Workshop

pdf bib
Error Detection for Treebank Validation
Bharat Ram Ambati | Rahul Agarwal | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the 9th Workshop on Asian Language Resources

pdf bib
Linguistically Rich Graph Based Data Driven Parsing For Hindi
Samar Husain | Raghu Pujitha Gade | Rajeev Sangal
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

2010

pdf bib
A High Recall Error Identification Tool for Hindi Treebank Validation
Bharat Ram Ambati | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the development of a hybrid tool for a semi-automated process for validation of treebank annotation at various levels. The tool is developed for error detection at the part-of-speech, chunk and dependency levels of a Hindi treebank, currently under development. The tool aims to identify as many errors as possible at these levels to achieve consistency in the task of annotation. Consistency in treebank annotation is a must for making data as error-free as possible and for providing quality assurance. The tool is aimed at ensuring consistency and to make manual validation cost effective. We discuss a rule based and a hybrid approach (statistical methods combined with rule-based methods) by which a high-recall system can be developed and used to identify errors in the treebank. We report some results of using the tool on a sample of data extracted from the Hindi treebank. We also argue how the tool can prove useful in improving the annotation guidelines which would in turn, better the quality of annotation in subsequent iterations.

pdf bib
Partial Parsing as a Method to Expedite Dependency Annotation of a Hindi Treebank
Mridul Gupta | Vineet Yadav | Samar Husain | Dipti Misra Sharma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper describes an approach to expedite the process of manual annotation of a Hindi dependency treebank which is currently under development. We propose a way by which consistency among a set of manual annotators could be improved. Furthermore, we show that our setup can also prove useful for evaluating when an inexperienced annotator is ready to start participating in the production of the treebank. We test our approach on sample sets of data obtained from an ongoing work on creation of this treebank. The results asserting our proposal are reported in this paper. We report results from a semi-automated approach of dependency annotation experiment. We find out the rate of agreement between annotators using Cohen’s Kappa. We also compare results with respect to the total time taken to annotate sample data-sets using a completely manual approach as opposed to a semi-automated approach. It is observed from the results that this semi-automated approach when carried out with experienced and trained human annotators improves the overall quality of treebank annotation and also speeds up the process.

pdf bib
Grammar Extraction from Treebanks for Hindi and Telugu
Prasanth Kolachina | Sudheer Kolachina | Anil Kumar Singh | Samar Husain | Viswanath Naidu | Rajeev Sangal | Akshar Bharati
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Grammars play an important role in many Natural Language Processing (NLP) applications. The traditional approach to creating grammars manually, besides being labor-intensive, has several limitations. With the availability of large scale syntactically annotated treebanks, it is now possible to automatically extract an approximate grammar of a language in any of the existing formalisms from a corresponding treebank. In this paper, we present a basic approach to extract grammars from dependency treebanks of two Indian languages, Hindi and Telugu. The process of grammar extraction requires a generalization mechanism. Towards this end, we explore an approach which relies on generalization of argument structure over the verbs based on their syntactic similarity. Such a generalization counters the effect of data sparseness in the treebanks. A grammar extracted using this system can not only expand already existing knowledge bases for NLP tasks such as parsing, but also aid in the creation of grammars for languages where none exist. Further, we show that the grammar extraction process can help in identifying annotation errors and thus aid in the task of the treebank validation.

pdf bib
Two Methods to Incorporate ’Local Morphosyntactic’ Features in Hindi Dependency Parsing
Bharat Ram Ambati | Samar Husain | Sambhav Jain | Dipti Misra Sharma | Rajeev Sangal
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
On the Role of Morphosyntactic Features in Hindi Dependency Parsing
Bharat Ram Ambati | Samar Husain | Joakim Nivre | Rajeev Sangal
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Improving Data Driven Dependency Parsing using Clausal Information
Phani Gadde | Karan Jindal | Samar Husain | Dipti Misra Sharma | Rajeev Sangal
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Constraint Based Hybrid Approach to Parsing Indian Languages
Akshar Bharati | Samar Husain | Meher Vijay | Kalyan Deepak | Dipti Misra Sharma | Rajeev Sangal
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Two stage constraint based hybrid approach to free word order language dependency parsing
Akshar Bharati | Samar Husain | Dipti Misra | Rajeev Sangal
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf bib
Effect of Minimal Semantics on Dependency Parsing
Bharat Ram Ambati | Pujitha Gade | Chaitanya GSK | Samar Husain
Proceedings of the Student Research Workshop

2008

pdf bib
Dependency Annotation Scheme for Indian Languages
Rafiya Begum | Samar Husain | Arun Dhwaj | Dipti Misra Sharma | Lakshmi Bai | Rajeev Sangal
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Towards an Annotated Corpus of Discourse Relations in Hindi
Rashmi Prasad | Samar Husain | Dipti Sharma | Aravind Joshi
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
Developing Verb Frames for Hindi
Rafiya Begum | Samar Husain | Lakshmi Bai | Dipti Misra Sharma
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper introduces an ongoing work on developing verb frames for Hindi. Verb frames capture syntactic commonalities of semantically related verbs. The main objective of this work is to create a linguistic resource which will prove to be indispensable for various NLP applications. We also hope this resource to help us better understand Hindi verbs. We motivate the basic verb argument structure using relations as introduced by Panini. We show the methodology used in preparing these frames and the criteria followed for classifying Hindi verbs.

2007

pdf bib
Simple Preposition Correspondence: A Problem in English to Indian Language Machine Translation
Samar Husain | Dipti Misra Sharma | Manohar Reddy
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions

2005

pdf bib
Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs
Anil Kumar Singh | Samar Husain
Proceedings of the ACL Workshop on Building and Using Parallel Texts