Leonidas Tsekouras


2020

pdf bib
Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources
Leonidas Tsekouras | Georgios Petasis | George Giannakopoulos | Aris Kosmopoulos
Proceedings of the Twelfth Language Resources and Evaluation Conference

Within this work we describe a framework for the collection and summarization of information from the Web in an entity-driven manner. The framework consists of a set of appropriate workflows and the Social Web Observatory platform, which implements those workflows, supporting them through a language analysis pipeline. The pipeline includes text collection/crawling, identification of different entities, clustering of texts into events related to entities, entity-centric sentiment analysis, but also text analytics and visualization functionalities. The latter allow the user to take advantage of the gathered information as actionable knowledge: to understand the dynamics of the public opinion for a given entity over time and across real-world events. We describe the platform and the analysis functionality and evaluate the performance of the system, by allowing human users to score how the system fares in its intended purpose of summarizing entity-centered information from different sources in the Web.

pdf bib
Ellogon Casual Annotation Infrastructure
Georgios Petasis | Leonidas Tsekouras
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a new annotation paradigm, casual annotation, along with a proposed architecture and a reference implementation, the Ellogon Casual Annotation Tool, which implements this paradigm and architecture. The novel aspects of the proposed paradigm originate from the vision to tightly integrate annotation with the casual, everyday activities of users. Annotating in a less “controlled” environment, and removing the bottleneck of selecting content and importing it to annotation infrastructures, casual annotation provides the ability to vastly increase the content that can be annotated and ease the annotation process through automatic pre-training. The proposed paradigm, architecture and reference implementation has been evaluated for more than two years on an annotation task related to sentiment analysis. Evaluation results suggest that, at least for this annotation task, there is a huge improvement in productivity after casual annotation adoption, in comparison to the more traditional annotation paradigms followed in the early stages of the annotation task.

2019

pdf bib
Social Web Observatory: An entity-driven, holistic information summarization platform across sources
Leonidas Tsekouras | Georgios Petasis | Aris Kosmopoulos
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.

2017

pdf bib
A Graph-based Text Similarity Measure That Employs Named Entity Information
Leonidas Tsekouras | Iraklis Varlamis | George Giannakopoulos
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Text comparison is an interesting though hard task, with many applications in Natural Language Processing. This work introduces a new text-similarity measure, which employs named-entities’ information extracted from the texts and the n-gram graphs’ model for representing documents. Using OpenCalais as a named-entity recognition service and the JINSECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means). The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.