Sarmad Hussain


2014

pdf bib
The CLE Urdu POS Tagset
Saba Urooj | Sarmad Hussain | Asad Mustafa | Rahila Parveen | Farah Adeeba | Tafseer Ahmed Khan | Miriam Butt | Annette Hautli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper presents a design schema and details of a new Urdu POS tagset. This tagset is designed due to challenges encountered in working with existing tagsets for Urdu. It uses tags that judiciously incorporate information about special morpho-syntactic categories found in Urdu. With respect to the overall naming schema and the basic divisions, the tagset draws on the Penn Treebank and a Common Tagset for Indian Languages. The resulting CLE Urdu POS Tagset consists of 12 major categories with subdivisions, resulting in 32 tags. The tagset has been used to tag 100k words of the CLE Urdu Digest Corpus, giving a tagging accuracy of 96.8%.

2013

pdf bib
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Miriam Butt | Sarmad Hussain
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Proceedings of the 10th Workshop on Asian Language Resources
Ruvan Weerasinghe | Sarmad Hussain | Virach Sornlertlamvanich | Rachel Edita O. Roxas
Proceedings of the 10th Workshop on Asian Language Resources

2011

pdf bib
Proceedings of the 9th Workshop on Asian Language Resources
Rachel Edita O. Roxas | Sarmad Hussain | Key-Sun Choi
Proceedings of the 9th Workshop on Asian Language Resources

pdf bib
Experiences in Building Urdu WordNet
Farah Adeeba | Sarmad Hussain
Proceedings of the 9th Workshop on Asian Language Resources

2010

pdf bib
Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar
Muhammad Kamran Malik | Tafseer Ahmed | Sebastian Sulger | Tina Bögel | Atif Gulzar | Ghulam Raza | Sarmad Hussain | Miriam Butt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present a system for transliterating the Arabic-based script of Urdu to a Roman transliteration scheme. The system is integrated into a larger system consisting of a morphology module, implemented via finite state technologies, and a computational LFG grammar of Urdu that was developed with the grammar development platform XLE (Crouch et al. 2008). Our long-term goal is to handle Hindi alongside Urdu; the two languages are very similar with respect to syntax and lexicon and hence, one grammar can be used to cover both languages. However, they are not similar concerning the script -- Hindi is written in Devanagari, while Urdu uses an Arabic-based script. By abstracting away to a common Roman transliteration scheme in the respective transliterators, our system can be enabled to handle both languages in parallel. In this paper, we discuss the pipeline architecture of the Urdu-Roman transliterator, mention several linguistic and orthographic issues and present the integration of the transliterator into the LFG parsing system.

pdf bib
Proceedings of the Eighth Workshop on Asian Language Resouces
Sarmad Hussain | Virach Sornlertlamvanich | Hammam Riza
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
Word Segmentation for Urdu OCR System
Misbah Akram | Sarmad Hussain
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
Dzongkha Word Segmentation
Sithar Norbu | Pema Choejey | Tenzin Dendup | Sarmad Hussain | Ahmed Muaz
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
A hybrid approach to Urdu verb phrase chunking
Wajid Ali | Sarmad Hussain
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
Urdu Word Segmentation
Nadir Durrani | Sarmad Hussain
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Analysis and Development of Urdu POS Tagged Corpus
Ahmed Muaz | Aasim Ali | Sarmad Hussain
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
Assas-band, an Affix-Exception-List Based Urdu Stemmer
Qurat-ul-Ain Akram | Asma Naseer | Sarmad Hussain
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Resources for Urdu Language Processing
Sarmad Hussain
Proceedings of the 6th Workshop on Asian Language Resources

2004

pdf bib
Letter-to-Sound Conversion for Urdu Text-to-Speech System
Sarmad Hussain
Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

pdf bib
Urdu Localization Project
Sarmad Hussain
Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages