Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis

Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki, Mika Ishizuka


Abstract
In this paper, we approach summary evaluation from an applied linguistics (AL) point of view. We provide computational tools to AL researchers to simplify the process of Idea Unit (IU) segmentation. The IU is a segmentation unit that can identify chunks of information. These chunks can be compared across documents to measure the content overlap between a summary and its source text. We propose a full revision of the annotation guidelines to allow machine implementation. The new guideline also improves the inter-annotator agreement, rising from 0.547 to 0.785 (Cohen’s Kappa). We release L2WS 2021, a IU gold standard corpus composed of 40 manually annotated student summaries. We propose IUExtract; i.e. the first automatic segmentation algorithm based on the IU. The algorithm was tested over the L2WS 2021 corpus. Our results are promising, achieving a precision of 0.789 and a recall of 0.844. We tested an existing approach to IU alignment via word embeddings with the state of the art model SBERT. The recorded precision for the top 1 aligned pair of IUs was 0.375. We deemed this result insufficient for effective automatic alignment. We propose “SAT”, an online tool to facilitate the collection of alignment gold standards for future training.
Anthology ID:
2022.lrec-1.498
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4663–4673
Language:
URL:
https://aclanthology.org/2022.lrec-1.498
DOI:
Bibkey:
Cite (ACL):
Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki, and Mika Ishizuka. 2022. Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4663–4673, Marseille, France. European Language Resources Association.
Cite (Informal):
Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis (Gecchele et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.498.pdf