Bharat Ram Ambati

A treebank is an important resource for developing many NLP based tools. Errors in the treebank may lead to error in the tools that use it. It is essential to ensure the quality of a treebank before it can be deployed for other purposes. Automatic (or semi-automatic) detection of errors in the treebank can reduce the manual work required to find and remove errors. Usually, the errors found automatically are manually corrected by the annotators. There is not much work reported so far on error correction tools which helps the annotators in correcting errors efficiently. In this paper, we present such an error correction tool that is an extension of the error detection method described earlier (Ambati et al., 2010; Ambati et al., 2011; Agarwal et al., 2012).

pdf bib abs

Word Sketches for Turkish
Bharat Ram Ambati | Siva Reddy | Adam Kilgarriff
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Word sketches are one-page, automatic, corpus-based summaries of a word's grammatical and collocational behaviour. In this paper we present word sketches for Turkish. Until now, word sketches have been generated using a purpose-built finite-state grammars. Here, we use an existing dependency parser. We describe the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it. We evaluate the word sketches in comparison with word sketches from a language independent sketch grammar on an external evaluation task called topic coherence, using Turkish WordNet to derive an evaluation set of coherent topics.

2011

pdf bib

Exploring self training for Hindi dependency parsing
Rahul Goutam | Bharat Ram Ambati
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib

Error Detection for Treebank Validation
Bharat Ram Ambati | Rahul Agarwal | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the 9th Workshop on Asian Language Resources

2010

pdf bib

Two Methods to Incorporate ’Local Morphosyntactic’ Features in Hindi Dependency Parsing
Bharat Ram Ambati | Samar Husain | Sambhav Jain | Dipti Misra Sharma | Rajeev Sangal
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib abs

An Integrated Digital Tool for Accessing Language Resources
Anil Kumar Singh | Bharat Ram Ambati
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Language resources can be classified under several categories. To be able to query and operate on all (or most of) these categories using a single digital tool would be very helpful for a large number of researchers working on languages. We describe such a tool in this paper. It is different from other such tools in that it allows querying and transformation on different kinds of resources (such as corpora, lexicon and language models) with the same framework. Search options can be given based on the kind of resource being queried. It is possible to select a matched resource and open it for editing in the specialized interfaces with which that resource is associated. The tool also allows the extracted or modified data to be saved separately, apart from having the usual facilities like displaying the results in KeyWord-In-Context (KWIC) format. We also present the notation used for querying and transformation, which is comparable to but different from the Corpus Query Language (CQL).

pdf bib

Importance of Linguistic Constraints in Statistical Dependency Parsing
Bharat Ram Ambati
Proceedings of the ACL 2010 Student Research Workshop

pdf bib abs

A High Recall Error Identification Tool for Hindi Treebank Validation
Bharat Ram Ambati | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the development of a hybrid tool for a semi-automated process for validation of treebank annotation at various levels. The tool is developed for error detection at the part-of-speech, chunk and dependency levels of a Hindi treebank, currently under development. The tool aims to identify as many errors as possible at these levels to achieve consistency in the task of annotation. Consistency in treebank annotation is a must for making data as error-free as possible and for providing quality assurance. The tool is aimed at ensuring consistency and to make manual validation cost effective. We discuss a rule based and a hybrid approach (statistical methods combined with rule-based methods) by which a high-recall system can be developed and used to identify errors in the treebank. We report some results of using the tool on a sample of data extracted from the Hindi treebank. We also argue how the tool can prove useful in improving the annotation guidelines which would in turn, better the quality of annotation in subsequent iterations.

pdf bib

On the Role of Morphosyntactic Features in Hindi Dependency Parsing
Bharat Ram Ambati | Samar Husain | Joakim Nivre | Rajeev Sangal
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages