Domain Adaptation for Named Entity Recognition Using CRFs

Tian Tian, Marco Dinarelli, Isabelle Tellier, Pedro Dias Cardoso


Abstract
In this paper we explain how we created a labelled corpus in English for a Named Entity Recognition (NER) task from multi-source and multi-domain data, for an industrial partner. We explain the specificities of this corpus with examples and describe some baseline experiments. We present some results of domain adaptation on this corpus using a labelled Twitter corpus (Ritter et al., 2011). We tested a semi-supervised method from (Garcia-Fernandez et al., 2014) combined with a supervised domain adaptation approach proposed in (Raymond and Fayolle, 2010) for machine learning experiments with CRFs (Conditional Random Fields). We use the same technique to improve the NER results on the Twitter corpus (Ritter et al., 2011). Our contributions thus consist in an industrial corpus creation and NER performance improvements.
Anthology ID:
L16-1089
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
561–565
Language:
URL:
https://aclanthology.org/L16-1089
DOI:
Bibkey:
Cite (ACL):
Tian Tian, Marco Dinarelli, Isabelle Tellier, and Pedro Dias Cardoso. 2016. Domain Adaptation for Named Entity Recognition Using CRFs. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 561–565, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Domain Adaptation for Named Entity Recognition Using CRFs (Tian et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1089.pdf