Georgios Petasis


2020

pdf bib
Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources
Leonidas Tsekouras | Georgios Petasis | George Giannakopoulos | Aris Kosmopoulos
Proceedings of the 12th Language Resources and Evaluation Conference

Within this work we describe a framework for the collection and summarization of information from the Web in an entity-driven manner. The framework consists of a set of appropriate workflows and the Social Web Observatory platform, which implements those workflows, supporting them through a language analysis pipeline. The pipeline includes text collection/crawling, identification of different entities, clustering of texts into events related to entities, entity-centric sentiment analysis, but also text analytics and visualization functionalities. The latter allow the user to take advantage of the gathered information as actionable knowledge: to understand the dynamics of the public opinion for a given entity over time and across real-world events. We describe the platform and the analysis functionality and evaluate the performance of the system, by allowing human users to score how the system fares in its intended purpose of summarizing entity-centered information from different sources in the Web.

pdf bib
Ellogon Casual Annotation Infrastructure
Georgios Petasis | Leonidas Tsekouras
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents a new annotation paradigm, casual annotation, along with a proposed architecture and a reference implementation, the Ellogon Casual Annotation Tool, which implements this paradigm and architecture. The novel aspects of the proposed paradigm originate from the vision to tightly integrate annotation with the casual, everyday activities of users. Annotating in a less “controlled” environment, and removing the bottleneck of selecting content and importing it to annotation infrastructures, casual annotation provides the ability to vastly increase the content that can be annotated and ease the annotation process through automatic pre-training. The proposed paradigm, architecture and reference implementation has been evaluated for more than two years on an annotation task related to sentiment analysis. Evaluation results suggest that, at least for this annotation task, there is a huge improvement in productivity after casual annotation adoption, in comparison to the more traditional annotation paradigms followed in the early stages of the annotation task.

2019

pdf bib
Segmentation of Argumentative Texts with Contextualised Word Representations
Georgios Petasis
Proceedings of the 6th Workshop on Argument Mining

The segmentation of argumentative units is an important subtask of argument mining, which is frequently addressed at a coarse granularity, usually assuming argumentative units to be no smaller than sentences. Approaches focusing at the clause-level granularity, typically address the task as sequence labeling at the token level, aiming to classify whether a token begins, is inside, or is outside of an argumentative unit. Most approaches exploit highly engineered, manually constructed features, and algorithms typically used in sequential tagging – such as Conditional Random Fields, while more recent approaches try to exploit manually constructed features in the context of deep neural networks. In this context, we examined to what extend recent advances in sequential labelling allow to reduce the need for highly sophisticated, manually constructed features, and whether limiting features to embeddings, pre-trained on large corpora is a promising approach. Evaluation results suggest the examined models and approaches can exhibit comparable performance, minimising the need for feature engineering.

pdf bib
Social Web Observatory: An entity-driven, holistic information summarization platform across sources
Leonidas Tsekouras | Georgios Petasis | Aris Kosmopoulos
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.

2017

pdf bib
Proceedings of the 4th Workshop on Argument Mining
Ivan Habernal | Iryna Gurevych | Kevin Ashley | Claire Cardie | Nancy Green | Diane Litman | Georgios Petasis | Chris Reed | Noam Slonim | Vern Walker
Proceedings of the 4th Workshop on Argument Mining

pdf bib
Unsupervised Detection of Argumentative Units though Topic Modeling Techniques
Alfio Ferrara | Stefano Montanelli | Georgios Petasis
Proceedings of the 4th Workshop on Argument Mining

In this paper we present a new unsupervised approach, “Attraction to Topics” – A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to classify sentences as claims and premises. Preliminary evaluation results suggest that topic information can be successfully used for the detection of argumentative sentences, at least for corpora used for evaluation. Our approach has been evaluated on two English corpora, the first of which contains 90 persuasive essays, while the second is a collection of 340 documents from user generated content.

2016

pdf bib
CLARIN-EL Web-based Annotation Tool
Ioannis Manousos Katakis | Georgios Petasis | Vangelis Karkaletsis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a new Web-based annotation tool, the “CLARIN-EL Web-based Annotation Tool”. Based on an existing annotation infrastructure offered by the “Ellogon” language enginneering platform, this new tool transfers a large part of Ellogon’s features and functionalities to a Web environment, by exploiting the capabilities of cloud computing. This new annotation tool is able to support a wide range of annotation tasks, through user provided annotation schemas in XML. The new annotation tool has already been employed in several annotation tasks, including the anotation of arguments, which is presented as a use case. The CLARIN-EL annotation tool is compared to existing solutions along several dimensions and features. Finally, future work includes the improvement of integration with the CLARIN-EL infrastructure, and the inclusion of features not currently supported, such as the annotation of aligned documents.

pdf bib
Identifying Argument Components through TextRank
Georgios Petasis | Vangelis Karkaletsis
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

2015

pdf bib
Argument Extraction from News
Christos Sardianos | Ioannis Manousos Katakis | Georgios Petasis | Vangelis Karkaletsis
Proceedings of the 2nd Workshop on Argumentation Mining

2014

pdf bib
The Ellogon Pattern Engine: Context-free Grammars over Annotations
Georgios Petasis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the pattern engine that is offered by the Ellogon language engineering platform. This pattern engine allows the application of context-free grammars over annotations, which are metadata generated during the processing of documents by natural language tools. In addition, grammar development is aided by a graphical grammar editor, giving grammar authors the capability to test and debug grammars.

pdf bib
Annotating Arguments: The NOMAD Collaborative Annotation Tool
Georgios Petasis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The huge amount of the available information in the Web creates the need for effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, which provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the NOMAD collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the NOMAD annotation tool is distributed with an open source license, as part of the Ellogon platform.

pdf bib
NOMAD: Linguistic Resources and Tools Aimed at Policy Formulation and Validation
George Kiomourtzis | George Giannakopoulos | Georgios Petasis | Pythagoras Karampiperis | Vangelis Karkaletsis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The NOMAD project (Policy Formulation and Validation through non Moderated Crowd-sourcing) is a project that supports policy making, by providing rich, actionable information related to how citizens perceive different policies. NOMAD automatically analyzes citizen contributions to the informal web (e.g. forums, social networks, blogs, newsgroups and wikis) using a variety of tools. These tools comprise text retrieval, topic classification, argument detection and sentiment analysis, as well as argument summarization. NOMAD provides decision-makers with a full arsenal of solutions starting from describing a domain and a policy to applying content search and acquisition, categorization and visualization. These solutions work in a collaborative manner in the policy-making arena. NOMAD, thus, embeds editing, analysis and visualization technologies into a concrete framework, applicable in a variety of policy-making and decision support settings In this paper we provide an overview of the linguistic tools and resources of NOMAD.

2012

pdf bib
The SYNC3 Collaborative Annotation Tool
Georgios Petasis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user's information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, that provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the SYNC3 collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the SYNC3 annotation tool is distributed with an open source license, as part of the Ellogon platform.

2011

pdf bib
Coreference Annotator - A new annotation tool for aligned bilingual corpora
Mara Tsoumari | Georgios Petasis
Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora

pdf bib
Unsupervised Domain Adaptation based on Text Relatedness
Georgios Petasis
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
Georgios Petasis | Dimitrios Petasis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, suitable for linguistic and language technology research and development, has attracted significant research interest recently. Several general purpose approaches for removing boilerplate have been presented in the literature; however the blogosphere poses additional requirements, such as a finer control over the extracted textual segments in order to accurately identify important elements, i.e. individual blog posts, titles, posting dates or comments. BlogBuster tries to provide such additional details along with boilerplate removal, following a rule-based approach. A small set of rules were manually constructed by observing a limited set of blogs from the Blogger and Wordpress hosting platforms. These rules operate on the DOM tree of an HTML page, as constructed by a popular browser, Mozilla Firefox. Evaluation results suggest that BlogBuster is very accurate when extracting corpora from blogs hosted in the Blogger and Wordpress, while exhibiting a reasonable precision when applied to blogs not hosted in these two popular blogging platforms.

2008

pdf bib
BOEMIE Ontology-Based Text Annotation Tool
Pavlina Fragkou | Georgios Petasis | Aris Theodorakos | Vangelis Karkaletsis | Constantine Spyropoulos
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be significantly facilitated by annotation tools that are able to annotate, according to a defined ontology, not only named entities but most importantly relations between them. This paper describes the BOEMIE ontology-based annotation tool which is able to locate blocks of text that correspond to specific types of named entities, fill tables corresponding to ontology concepts with those named entities and link the filled tables based on relations defined in the domain ontology. Additionally, it can perform annotation of blocks of text that refer to the same topic. The tool has a user-friendly interface, supports automatic pre-annotation, annotation comparison as well as customization to other annotation schemata. The annotation tool has been used in a large scale annotation task involving 3,000 web pages regarding athletics. It has also been used in another annotation task involving 503 web pages with medical information, in different languages.

2002

pdf bib
Ellogon: A New Text Engineering Platform
Georgios Petasis | Vangelis Karkaletsis | Georgios Paliouras | Ion Androutsopoulos | Constantine D. Spyropoulos
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Multilingual XML-Based Named Entity Recognition for E-Retail Domains
Claire Grover | Scott McDonald | Donnla Nic Gearailt | Vangelis Karkaletsis | Dimitra Farmakiotou | Georgios Samaritakis | Georgios Petasis | Maria Teresa Pazienza | Michele Vindigni | Frantz Vichot | Francis Wolinski
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems
Georgios Petasis | Frantz Vichot | Francis Wolinski | Georgios Paliouras | Vangelis Karkaletsis | Constantine D. Spyropoulos
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics