GATE Teamware 2: An open-source tool for collaborative document classification annotation
David Wilby
Twin Karmakharm
Ian Roberts
Xingyi Song
Kalina Bontcheva
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
We present GATE Teamware 2: an open-source web-based platform for managing teams of annotators working on document classification tasks. GATE Teamware 2 is an entirely re-engineered successor to GATE Teamware, using contemporary web frameworks. The software allows the management of teams of multiple annotators, project managers and administrators - including the management of annotators - across multiple projects. Projects can be configured to control and monitor the annotation statistics and have a highly flexible JSON-configurable annotation display which can include arbitrary HTML. Optionally, documents can be uploaded with pre-existing annotations and documents are served to annotators in a random order by default to reduce bias. Crucially, annotators can be trained on applying the annotation guidelines correctly and then screened for quality assurance purposes, prior to being cleared for independent annotation. GATE Teamware 2 can be self-deployed, including in container orchestration environments, or provided as private, hosted cloud instances.GATE Teamware 2 is an open-source software and can be downloaded from
https://github.com/GATENLP/gate-teamware.A demonstration video of the system has also been made available at
Towards Practical Semantic Interoperability in NLP Platforms
Julian Moreno-Schneider
Rémi Calizzano
Florian Kintzel
Georg Rehm
Dimitris Galanis
Ian Roberts
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
Interoperability is a necessity for the resolution of complex tasks that require the interconnection of several NLP services. This article presents the approaches that were adopted in three scenarios to address the respective interoperability issues. The first scenario describes the creation of a common REST API for a specific platform, the second scenario presents the interconnection of several platforms via mapping of different representation formats and the third scenario shows the complexities of interoperability through semantic schema mapping or automatic translation.
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm
Stelios Piperidis
Kalina Bontcheva
Jan Hajic
Victoria Arranz
Andrejs Vasiļjevs
Gerhard Backfried
Jose Manuel Gomez-Perez
Ulrich Germann
Rémi Calizzano
Nils Feldhus
Stefanie Hegele
Florian Kintzel
Katrin Marheinecke
Julian Moreno-Schneider
Dimitris Galanis
Penny Labropoulou
Miltos Deligiannis
Katerina Gkirtzou
Athanasia Kolovou
Dimitris Gkoumas
Leon Voukoutis
Ian Roberts
Jana Hamrlova
Dusan Varis
Lukas Kacena
Khalid Choukri
Valérie Mapelli
Mickaël Rigault
Julija Melnika
Miro Janosik
Katja Prinz
Andres Garcia-Silva
Cristian Berrio
Ondrej Klejch
Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm
Dimitris Galanis
Penny Labropoulou
Stelios Piperidis
Martin Welß
Ricardo Usbeck
Joachim Köhler
Miltos Deligiannis
Katerina Gkirtzou
Johannes Fischer
Christian Chiarcos
Nils Feldhus
Julian Moreno-Schneider
Florian Kintzel
Elena Montiel
Víctor Rodríguez Doncel
John Philip McCrae
David Laqua
Irina Patricia Theile
Christian Dittmar
Kalina Bontcheva
Ian Roberts
Andrejs Vasiļjevs
Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.
European Language Grid: An Overview
Georg Rehm
Maria Berger
Ela Elsholz
Stefanie Hegele
Florian Kintzel
Katrin Marheinecke
Stelios Piperidis
Miltos Deligiannis
Dimitris Galanis
Katerina Gkirtzou
Penny Labropoulou
Kalina Bontcheva
David Jones
Ian Roberts
Jan Hajič
Jana Hamrlová
Lukáš Kačena
Khalid Choukri
Victoria Arranz
Andrejs Vasiļjevs
Orians Anvari
Andis Lagzdiņš
Jūlija Meļņika
Gerhard Backfried
Erinç Dikici
Miroslav Janosik
Katja Prinz
Christoph Prinz
Severin Stampler
Dorothea Thomas-Aniola
José Manuel Gómez-Pérez
Andres Garcia Silva
Christian Berrío
Ulrich Germann
Steve Renals
Ondrej Klejch
Proceedings of the Twelfth Language Resources and Evaluation Conference
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 national competence centres and the European LT Council for outreach and coordination purposes.
Reasoning Over Paths via Knowledge Base Completion
Saatviga Sudhahar
Andrea Pierleoni
Ian Roberts
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)
Reasoning over paths in large scale knowledge graphs is an important problem for many applications. In this paper we discuss a simple approach to automatically build and rank paths between a source and target entity pair with learned embeddings using a knowledge base completion model (KBC). We assembled a knowledge graph by mining the available biomedical scientific literature and extracted a set of high frequency paths to use for validation. We demonstrate that our method is able to effectively rank a list of known paths between a pair of entities and also come up with plausible paths that are not present in the knowledge graph. For a given entity pair we are able to reconstruct the highest ranking path 60% of the time within the top 10 ranked paths and achieve 49% mean average precision. Our approach is compositional since any KBC model that can produce vector representations of entities can be used.
Deep Bidirectional Transformers for Relation Extraction without Supervision
Yannis Papanikolaou
Ian Roberts
Andrea Pierleoni
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
We present a novel framework to deal with relation extraction tasks in cases where there is complete lack of supervision, either in the form of gold annotations, or relations from a knowledge base. Our approach leverages syntactic parsing and pre-trained word embeddings to extract few but precise relations, which are then used to annotate a larger corpus, in a manner identical to distant supervision. The resulting data set is employed to fine tune a pre-trained BERT model in order to perform relation extraction. Empirical evaluation on four data sets from the biomedical domain shows that our method significantly outperforms two simple baselines for unsupervised relation extraction and, even if not using any supervision at all, achieves slightly worse results than the state-of-the-art in three out of four data sets. Importantly, we show that it is possible to successfully fine tune a large pretrained language model with noisy data, as opposed to previous works that rely on gold data for fine tuning.
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Leon Derczynski
Kalina Bontcheva
Ian Roberts
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL’2003 news dataset. For instance, the biggest Ritter tweet corpus is only 45,000 tokens – a mere 15% the size of CoNLL’2003. Another major shortcoming is the lack of temporal, geographic, and author diversity. This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire. The corpus is released openly, including source text and intermediate annotations.
The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy
Kalina Bontcheva
Ian Roberts
Leon Derczynski
Samantha Alexander-Eames
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
AnnoMarket: An Open Cloud Platform for NLP
Valentin Tablan
Kalina Bontcheva
Ian Roberts
Hamish Cunningham
Marin Dimitrov
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
A Large-Scale Resource for Storing and Recognizing Technical Terminology
Henk Harkema
Robert Gaizauskas
Mark Hepple
Neil Davis
Yikun Guo
Angus Roberts
Ian Roberts
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
A Large Scale Terminology Resource for Biomedical Text Processing
Henk Harkema
Robert Gaizauskas
Mark Hepple
Angus Roberts
Ian Roberts
Neil Davis
Yikun Guo
HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases