On the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT

Chi-kiu Lo, Dekai Wu


Abstract
We present analyses showing that HMEANT is a reliable, accurate and fine-grained semantic frame based human MT evaluation metric with high inter-annotator agreement (IAA) and correlation with human adequacy judgments, despite only requiring a minimal training of about 15 minutes for lay annotators. Previous work shows that the IAA on the semantic role labeling (SRL) subtask within HMEANT is over 70%. In this paper we focus on (1) the IAA on the semantic role alignment task and (2) the overall IAA of HMEANT. Our results show that the IAA on the alignment task of HMEANT is over 90% when humans align SRL output from the same SRL annotator, which shows that the instructions on the alignment task are sufficiently precise, although the overall IAA where humans align SRL output from different SRL annotators falls to only 61% due to the pipeline effect on the disagreement in the two annotation task. We show that instead of manually aligning the semantic roles using an automatic algorithm not only helps maintaining the overall IAA of HMEANT at 70%, but also provides a finer-grained assessment on the phrasal similarity of the semantic role fillers. This suggests that HMEANT equipped with automatic alignment is reliable and accurate for humans to evaluate MT adequacy while achieving higher correlation with human adequacy judgments than HTER.
Anthology ID:
L14-1136
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
602–607
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1198_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Chi-kiu Lo and Dekai Wu. 2014. On the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 602–607, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
On the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT (Lo & Wu, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1198_Paper.pdf