Inferring Subcat Frames of Verbs in Urdu

Ghulam Raza


Abstract
This paper describes an approach for inferring syntactic frames of verbs in Urdu from an untagged corpus. Urdu, like many other South Asian languages, is a free word order and case-rich language. Separable lexical units mark different constituents for case in phrases and clauses and are called case clitics. There is not always a one to one correspondence between case clitic form and case, and case and grammatical function in Urdu. Case clitics, therefore, can not serve as direct clues for extracting the syntactic frames of verbs. So a two-step approach has been implemented. In a first step, all case clitic combinations for a verb are extracted and the unreliable ones are filtered out by applying the inferential statistics. In a second step, the information of occurrences of case clitic forms in different combinations as a whole and on individual level is processed to infer all possible syntactic frames of the verb.
Anthology ID:
L10-1369
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/536_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ghulam Raza. 2010. Inferring Subcat Frames of Verbs in Urdu. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Inferring Subcat Frames of Verbs in Urdu (Raza, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/536_Paper.pdf