Hiroaki Funayama


pdf bib
TohokuNLP at SemEval-2023 Task 5: Clickbait Spoiling via Simple Seq2Seq Generation and Ensembling
Hiroto Kurita | Ikumi Ito | Hiroaki Funayama | Shota Sasaki | Shoji Moriya | Ye Mengyu | Kazuma Kokuta | Ryujin Hatakeyama | Shusaku Sone | Kentaro Inui
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our system submitted to SemEval-2023 Task 5: Clickbait Spoiling. We work on spoiler generation of the subtask 2 and develop a system which comprises two parts: 1) simple seq2seq spoiler generation and 2) post-hoc model ensembling. Using this simple method, we address the challenge of generating multipart spoiler. In the test set, our submitted system outperformed the baseline by a large margin (approximately 10 points above on the BLEU score) for mixed types of spoilers. We also found that our system successfully handled the challenge of the multipart spoiler, confirming the effectiveness of our approach.


pdf bib
Preventing Critical Scoring Errors in Short Answer Scoring with Confidence Estimation
Hiroaki Funayama | Shota Sasaki | Yuichiroh Matsubayashi | Tomoya Mizumoto | Jun Suzuki | Masato Mita | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Many recent Short Answer Scoring (SAS) systems have employed Quadratic Weighted Kappa (QWK) as the evaluation measure of their systems. However, we hypothesize that QWK is unsatisfactory for the evaluation of the SAS systems when we consider measuring their effectiveness in actual usage. We introduce a new task formulation of SAS that matches the actual usage. In our formulation, the SAS systems should extract as many scoring predictions that are not critical scoring errors (CSEs). We conduct the experiments in our new task formulation and demonstrate that a typical SAS system can predict scores with zero CSE for approximately 50% of test data at maximum by filtering out low-reliablility predictions on the basis of a certain confidence estimation. This result directly indicates the possibility of reducing half the scoring cost of human raters, which is more preferable for the evaluation of SAS systems.