Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor 2020-11 Association for Computational Linguistics Online text conference publication eval4nlp-2020-evaluation 10.18653/v1/2020.eval4nlp-1.0 https://aclanthology.org/2020.eval4nlp-1.0/ Truth or Error? Towards systematic analysis of factual errors in abstractive summaries Klaus-Michael Lux author Maya Sappelli author Martha Larson author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication lux-etal-2020-truth 10.18653/v1/2020.eval4nlp-1.1 https://aclanthology.org/2020.eval4nlp-1.1/ 2020-11 1 10 Fill in the BLANC: Human-free quality estimation of document summaries Oleg Vasilyev author Vedant Dharnidharka author John Bohannon author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication vasilyev-etal-2020-fill 10.18653/v1/2020.eval4nlp-1.2 https://aclanthology.org/2020.eval4nlp-1.2/ 2020-11 11 20 Item Response Theory for Efficient Human Evaluation of Chatbots João Sedoc author Lyle Ungar author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication sedoc-ungar-2020-item 10.18653/v1/2020.eval4nlp-1.3 https://aclanthology.org/2020.eval4nlp-1.3/ 2020-11 21 33 ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT Hwanhee Lee author Seunghyun Yoon author Franck Dernoncourt author Doo Soon Kim author Trung Bui author Kyomin Jung author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication lee-etal-2020-vilbertscore 10.18653/v1/2020.eval4nlp-1.4 https://aclanthology.org/2020.eval4nlp-1.4/ 2020-11 34 39 BLEU Neighbors: A Reference-less Approach to Automatic Evaluation Kawin Ethayarajh author Dorsa Sadigh author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication ethayarajh-sadigh-2020-bleu 10.18653/v1/2020.eval4nlp-1.5 https://aclanthology.org/2020.eval4nlp-1.5/ 2020-11 40 50 Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance Xi Chen author Nan Ding author Tomer Levinboim author Radu Soricut author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication chen-etal-2020-improving-text 10.18653/v1/2020.eval4nlp-1.6 https://aclanthology.org/2020.eval4nlp-1.6/ 2020-11 51 59 On the Evaluation of Machine Translation n-best Lists Jacob Bremerman author Huda Khayrallah author Douglas Oard author Matt Post author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication bremerman-etal-2020-evaluation 10.18653/v1/2020.eval4nlp-1.7 https://aclanthology.org/2020.eval4nlp-1.7/ 2020-11 60 68 Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization Rahul Jha author Keping Bi author Yang Li author Mahdi Pakdaman author Asli Celikyilmaz author Ivan Zhiboedov author Kieran McDonald author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication jha-etal-2020-artemis 10.18653/v1/2020.eval4nlp-1.8 https://aclanthology.org/2020.eval4nlp-1.8/ 2020-11 69 78 Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models Reda Yacouby author Dustin Axman author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication yacouby-axman-2020-probabilistic 10.18653/v1/2020.eval4nlp-1.9 https://aclanthology.org/2020.eval4nlp-1.9/ 2020-11 79 91 A survey on Recognizing Textual Entailment as an NLP Evaluation Adam Poliak author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication poliak-2020-survey 10.18653/v1/2020.eval4nlp-1.10 https://aclanthology.org/2020.eval4nlp-1.10/ 2020-11 92 109 Grammaticality and Language Modelling Jingcheng Niu author Gerald Penn author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication niu-penn-2020-grammaticality 10.18653/v1/2020.eval4nlp-1.11 https://aclanthology.org/2020.eval4nlp-1.11/ 2020-11 110 119 One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations Jesper Brink Andersen author Mikkel Bak Bertelsen author Mikkel Hørby Schou author Manuel R Ciosici author Ira Assent author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication brink-andersen-etal-2020-one 10.18653/v1/2020.eval4nlp-1.12 https://aclanthology.org/2020.eval4nlp-1.12/ 2020-11 120 130 Are Some Words Worth More than Others? Shiran Dudy author Steven Bedrick author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication dudy-bedrick-2020-words 10.18653/v1/2020.eval4nlp-1.13 https://aclanthology.org/2020.eval4nlp-1.13/ 2020-11 131 142 On Aligning OpenIE Extractions with Knowledge Bases: A Case Study Kiril Gashteovski author Rainer Gemulla author Bhushan Kotnis author Sven Hertling author Christian Meilicke author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication gashteovski-etal-2020-aligning 10.18653/v1/2020.eval4nlp-1.14 https://aclanthology.org/2020.eval4nlp-1.14/ 2020-11 143 154 ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation Hanna Wecker author Annemarie Friedrich author Heike Adel author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication wecker-etal-2020-clusterdatasplit 10.18653/v1/2020.eval4nlp-1.15 https://aclanthology.org/2020.eval4nlp-1.15/ 2020-11 155 163 Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation Neslihan Iskender author Tim Polzehl author Sebastian Möller author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication iskender-etal-2020-best 10.18653/v1/2020.eval4nlp-1.16 https://aclanthology.org/2020.eval4nlp-1.16/ 2020-11 164 175 Evaluating Word Embeddings on Low-Resource Languages Nathan Stringham author Mike Izbicki author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication stringham-izbicki-2020-evaluating 10.18653/v1/2020.eval4nlp-1.17 https://aclanthology.org/2020.eval4nlp-1.17/ 2020-11 176 186