Berkay Topçu
2021
TR-SEQ: Named Entity Recognition Dataset for Turkish Search Engine Queries
Berkay Topçu
|
İlknur Durgar El-Kahlout
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Recognizing named entities in short search engine queries is a difficult task due to their weaker contextual information compared to long sentences. Standard named entity recognition (NER) systems that are trained on grammatically correct and long sentences fail to perform well on such queries. In this study, we share our efforts towards creating a cleaned and labeled dataset of real Turkish search engine queries (TR-SEQ) and introduce an extended label set to satisfy the search engine needs. A NER system is trained by applying the state-of-the-art deep learning method BERT to the collected data and its high performance on search engine queries is reported. Moreover, we compare our results with the state-of-the-art Turkish NER systems.