Universal Dependencies for Persian

Mojgan Seraji, Filip Ginter, Joakim Nivre


Abstract
The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD.
Anthology ID:
L16-1374
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2361–2365
Language:
URL:
https://aclanthology.org/L16-1374
DOI:
Bibkey:
Cite (ACL):
Mojgan Seraji, Filip Ginter, and Joakim Nivre. 2016. Universal Dependencies for Persian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2361–2365, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Universal Dependencies for Persian (Seraji et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1374.pdf