How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models

Brian Riordan, Michael Flor, Robert Pugh


Abstract
Character-based representations in neural models have been claimed to be a tool to overcome spelling variation in in word token-based input. We examine this claim in neural models for content scoring. We formulate precise hypotheses about the possible effects of adding character representations to word-based models and test these hypotheses on large-scale real world content scoring datasets. We find that, while character representations may provide small performance gains in general, their effectiveness in accounting for spelling variation may be limited. We show that spelling correction can provide larger gains than character representations, and that spelling correction improves the performance of models with character representations. With these insights, we report a new state of the art on the ASAP-SAS content scoring dataset.
Anthology ID:
W19-4411
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | BEA | WS
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
116–126
Language:
URL:
https://aclanthology.org/W19-4411
DOI:
10.18653/v1/W19-4411
Bibkey:
Cite (ACL):
Brian Riordan, Michael Flor, and Robert Pugh. 2019. How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 116–126, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models (Riordan et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4411.pdf