Joaquim Silva
2024
Authorship Attribution with Rejection Capability in Challenging Contexts of Limited Datasets
Pedro Oliveira
|
Joaquim Silva
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
2014
Context Sense Clustering for Translation
João Casteleiro
|
Gabriel Lopes
|
Joaquim Silva
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
2010
Towards Automatic Building of Document Keywords
Joaquim Silva
|
Gabriel Lopes
Coling 2010: Posters
2004
Extracting Named Entities. A Statistical Approach
Joaquim Silva
|
Zornitsa Kozareva
|
Veska Noncheva
|
Gabriel Lopes
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language independent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain different clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.
Search