Aristomenis Thanopoulos

2008

Eksairesis: A Domain-Adaptable System for Ontology Building from Unstructured Text
Katia Lida Kermanidis | Aristomenis Thanopoulos | Manolis Maragoudakis | Nikos Fakotakis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes Eksairesis, a system for learning economic domain knowledge automatically from Modern Greek text. The knowledge is in the form of economic terms and the semantic relations that govern them. The entire process in based on the use of minimal language-dependent tools, no external linguistic resources, and merely free, unstructured text. The methodology is thereby easily portable to other domains and other languages. The text is pre-processed with basic morphological annotation, and semantic (named and other) entities are identified using supervised learning techniques. Statistical filtering, i.e. corpora comparison is used to extract domain terms and supervised learning is again employed to detect the semantic relations between pairs of terms. Advanced classification schemata, ensemble learning, and one-sided sampling, are experimented with in order to deal with the noise in the data, which is unavoidable due to the low pre-processing level and the lack of sophisticated resources. An average 68.5% f-score over all the classes is achieved when learning semantic relations. Bearing in mind the use of minimal resources and the highly automated nature of the process, classification performance is very promising, compared to results reported in previous work.

2003

pdf bib abs

Text Tokenization for Knowledge-free Automatic Extraction of Lexical Similarities
Aristomenis Thanopoulos | Nikos Fakotakis | George Kokkinakis
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Previous studies on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theory of contextual lexical semantics implies that larger segments of text, namely non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion applying automatic collocation extraction to identify and merge such multiwords prior to the similarity estimation process. Employing an automatic WordNet-based comparative evaluation scheme along with a manual evaluation procedure, we ascertain improvement of the extracted similarity relations.