Kalliopi Zervanou

Also published as: Kalliopi A. Zervanou

2019

Term Based Semantic Clusters for Very Short Text Classification
Jasper Paalman | Shantanu Mullick | Kalliopi Zervanou | Yingqian Zhang
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.

In this work, we investigate the role of morphology on the performance of semantic similarity for morphologically rich languages, such as German and Greek. The challenge in processing languages with richer morphology than English, lies in reducing estimation error while addressing the semantic distortion introduced by a stemmer or a lemmatiser. For this purpose, we propose a methodology for selective stemming, based on a semantic distortion metric. The proposed algorithm is tested on the task of similarity estimation between words using two types of corpus-based similarity metrics: co-occurrence-based and context-based. The performance on morphologically rich languages is boosted by stemming with the context-based metric, unlike English, where the best results are obtained by the co-occurrence-based metric. A key finding is that the estimation error reduction is different when a word is used as a feature, rather than when it is used as a target word.

2013

pdf bib

Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Piroska Lendvai | Kalliopi Zervanou
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib

Semantic Similarity Computation for Abstract and Concrete Nouns Using Network-based Distributional Semantic Models
Elias Iosif | Alexandros Potamianos | Maria Giannoudaki | Kalliopi Zervanou
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

2012

pdf bib

Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Kalliopi Zervanou | Antal van den Bosch
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2011

pdf bib

Enrichment and Structuring of Archival Description Metadata
Kalliopi Zervanou | Ioannis Korkontzelos | Antal van den Bosch | Sophia Ananiadou
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib

Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Kalliopi Zervanou | Piroska Lendvai
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2010

pdf bib

UvT: The UvT Term Extraction System in the Keyphrase Extraction Task
Kalliopi Zervanou
Proceedings of the 5th International Workshop on Semantic Evaluation

2004

pdf bib abs

A Domain-Independent Approach to IE Rule Development
Kalliopi Zervanou | John McNaught
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction patterns. Our research is based on the hypothesis that, terms (the linguistic representation of concepts in a specialised domain) and Named Entities (the names of persons, organisations and dates of importance in the text) can together be considered as the basic semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts and the definition of what constitutes an information extraction template in linguistic terms. The extraction patterns discovered by this approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the grammar formalism of choice.