Named Entity Recognition to Detect Criminal Texts on the Web
Paweł Skórzewski | Mikołaj Pieniowski | Grazyna Demenko
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper presents a toolkit that applies named-entity extraction techniques to identify information related to criminal activity in texts from the Polish Internet. The methodological and technical assumptions were established following the requirements of our application users from the Border Guard. Due to the specificity of the users’ needs and the specificity of web texts, we used original methodologies related to the search for desired texts, the creation of domain lexicons, the annotation of the collected text resources, and the combination of rule-based and machine-learning techniques for extracting the information desired by the user. The performance of our tools has been evaluated on 6240 manually annotated text fragments collected from Internet sources. Evaluation results and user feedback show that our approach is feasible and has potential value for real-life applications in the daily work of border guards. Lexical lookup combined with hand-crafted rules and regular expressions, supported by text statistics, can make a decent specialized entity recognition system in the absence of large data sets required for training a good neural network.
EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development
Marek Kubis | Paweł Skórzewski | Tomasz Ziętkiewicz
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
The paper describes a system for end-user development using natural language. Our approach uses a ranking model to identify the actions to be executed followed by reference and parameter matching models to select parameter values that should be set for the given commands. We discuss the results of evaluation and possible improvements for future work.