Using Transfer Learning to Assist Exploratory Corpus Annotation

Paul Felt, Eric Ringger, Kevin Seppi, Kristian Heal


Abstract
We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation.
Anthology ID:
L14-1168
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
140–145
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/147_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Paul Felt, Eric Ringger, Kevin Seppi, and Kristian Heal. 2014. Using Transfer Learning to Assist Exploratory Corpus Annotation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 140–145, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Using Transfer Learning to Assist Exploratory Corpus Annotation (Felt et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/147_Paper.pdf