Comparing Constituency and Dependency Representations for SMT Phrase-Extraction

Mary Hearne; Sylwia Ozdowska; John Tinsley

Comparing Constituency and Dependency Representations for SMT Phrase-Extraction

Mary Hearne, Sylwia Ozdowska, John Tinsley

Abstract

We consider the value of replacing and/or combining string-basedmethods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT.

Anthology ID:: 2008.jeptalnrecital-court.14
Volume:: Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Month:: June
Year:: 2008
Address:: Avignon, France
Editors:: Frédéric Béchet, Jean-Francois Bonastre
Venue:: JEP/TALN/RECITAL
SIG:
Publisher:: ATALA
Note:
Pages:: 131–140
Language:
URL:: https://aclanthology.org/2008.jeptalnrecital-court.14/
DOI:
Bibkey:
Cite (ACL):: Mary Hearne, Sylwia Ozdowska, and John Tinsley. 2008. Comparing Constituency and Dependency Representations for SMT Phrase-Extraction. In Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, pages 131–140, Avignon, France. ATALA.
Cite (Informal):: Comparing Constituency and Dependency Representations for SMT Phrase-Extraction (Hearne et al., JEP/TALN/RECITAL 2008)
Copy Citation:
PDF:: https://aclanthology.org/2008.jeptalnrecital-court.14.pdf

PDF Cite Search Fix data