Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs Machine Learning
Ioannis Partalas | Cédric Lopez | Frédérique Segond
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)
Comparing Named-Entity Recognizers in a Targeted Domain : Handcrafted Rules vs. Machine Learning Named-Entity Recognition concerns the classification of textual objects in a predefined set of categories such as persons, organizations, and localizations. While Named-Entity Recognition is well studied since 20 years, the application to specialized domains still poses challenges for current systems. We developed a rule-based system and two machine learning approaches to tackle the same task : recognition of product names, brand names, etc., in the domain of Cosmetics, for French. Our systems can thus be compared under ideal conditions. In this paper, we introduce both systems and we compare them.
We presented in this work our participation in the 2nd Named Entity Recognition for Twitter shared task. The task has been cast as a sequence labeling one and we employed a learning to search approach in order to tackle it. We also leveraged LOD for extracting rich contextual features for the named-entities. Our submission achieved F-scores of 46.16 and 60.24 for the classification and the segmentation tasks and ranked 2nd and 3rd respectively. The post-analysis showed that LOD features improved substantially the performance of our system as they counter-balance the lack of context in tweets. The shared task gave us the opportunity to test the performance of NER systems in short and noisy textual data. The results of the participated systems shows that the task is far to be considered as a solved one and methods with stellar performance in normal texts need to be revised.