SRL for low resource languages isn’t needed for semantic SMT

Meriem Beloucif, Dekai Wu


Abstract
Previous attempts at injecting semantic frame biases into SMT training for low resource languages failed because either (a) no semantic parser is available for the low resource input language; or (b) the output English language semantic parses excise relevant parts of the alignment space too aggressively. We present the first semantic SMT model to succeed in significantly improving translation quality across many low resource input languages for which no automatic SRL is available —consistently and across all common MT metrics. The results we report are the best by far to date for this type of approach; our analyses suggest that in general, easier approaches toward including semantics in training SMT models may be more feasible than generally assumed even for low resource languages where semantic parsers remain scarce. While recent proposals to use the crosslingual evaluation metric XMEANT during inversion transduction grammar (ITG) induction are inapplicable to low resource languages that lack semantic parsers, we break the bottleneck via a vastly improved method of biasing ITG induction toward learning more semantically correct alignments using the monolingual semantic evaluation metric MEANT. Unlike XMEANT, MEANT requires only a readily-available English (output language) semantic parser. The advances we report here exploit the novel realization that MEANT represents an excellent way to semantically bias expectationmaximization induction even for low resource languages. We test our systems on challenging languages including Amharic, Uyghur, Tigrinya and Oromo. Results show that our model influences the learning towards more semantically correct alignments, leading to better translation quality than both the standard ITG or GIZA++ based SMT training models on different datasets.
Anthology ID:
2018.eamt-main.6
Volume:
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Month:
May
Year:
2018
Address:
Alicante, Spain
Editors:
Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
Note:
Pages:
79–88
Language:
URL:
https://aclanthology.org/2018.eamt-main.6
DOI:
Bibkey:
Cite (ACL):
Meriem Beloucif and Dekai Wu. 2018. SRL for low resource languages isn’t needed for semantic SMT. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 79–88, Alicante, Spain.
Cite (Informal):
SRL for low resource languages isn’t needed for semantic SMT (Beloucif & Wu, EAMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.eamt-main.6.pdf