Annotating COMPARA, a Grammar-aware Parallel Corpus

Diana Santos, Susana Inácio


Abstract
In this paper we describe the annotation of COMPARA, currently the largest post-edited parallel corpora which include Portuguese. We describe the motivation, the results so far, and the way the corpus is being annotated. We also provide the first grounded results about syntactical ambiguity in Portuguese. Finally, we discuss some interesting problems in this connection.
Anthology ID:
L06-1175
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/309_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Diana Santos and Susana Inácio. 2006. Annotating COMPARA, a Grammar-aware Parallel Corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Annotating COMPARA, a Grammar-aware Parallel Corpus (Santos & Inácio, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/309_pdf.pdf