In the context of data labeling, NLP researchers are increasingly interested in having humans select rationales, a subset of input tokens relevant to the chosen label. We conducted a 332-participant online user study to understand how humans select rationales, especially how different instructions and user interface affordances impact the rationales chosen. Participants labeled ten movie reviews as positive or negative, selecting words and phrases supporting their label as rationales. We varied the instructions given, the rationale-selection task, and the user interface. Participants often selected about 12% of input tokens as rationales, but selected fewer if unable to drag over multiple tokens at once. Whereas participants were near unanimous in their data labels, they were far less consistent in their rationales. The user interface affordances and task greatly impacted the types of rationales chosen. We also observed large variance across participants.
We explore a novel approach to reading compliance, leveraging large language models to select inline challenges that discourage skipping during reading. This lightweight ‘testing’ is accomplished through automatically identified context clozes where the reader must supply a missing word that would be hard to guess if earlier material was skipped. Clozes are selected by scoring each word by the contrast between its likelihood with and without prior sentences as context, preferring to leave gaps where this contrast is high. We report results of an initial human-participant test that indicates this method can find clozes that have this property.