Felix Sasaki


2018

2016

In the recent years, Linked Data and Language Technology solutions gained popularity. Nevertheless, their coupling in real-world business is limited due to several issues. Existing products and services are developed for a particular domain, can be used only in combination with already integrated datasets or their language coverage is limited. In this paper, we present an innovative solution FREME - an open framework of e-Services for multilingual and semantic enrichment of digital content. The framework integrates six interoperable e-Services. We describe the core features of each e-Service and illustrate their usage in the context of four business cases: i) authoring and publishing; ii) translation and localisation; iii) cross-lingual access to data; and iv) personalised Web content recommendations. Business cases drive the design and development of the framework.

2014

As language resources start to become available in linked data formats, it becomes relevant to consider how linked data interoperability can play a role in active language processing workflows as well as for more static language resource publishing. This paper proposes that linked data may have a valuable role to play in tracking the use and generation of language resources in such workflows in order to assess and improve the performance of the language technologies that use the resources, based on feedback from the human involvement typically required within such processes. We refer to this as Active Curation of the language resources, since it is performed systematically over language processing workflows to continuously improve the quality of the resource in specific applications, rather than via dedicated curation steps. We use modern localisation workflows, i.e. assisted by machine translation and text analytics services, to explain how linked data can support such active curation. By referencing how a suitable linked data vocabulary can be assembled by combining existing linked data vocabularies and meta-data from other multilingual content processing annotations and tool exchange standards we aim to demonstrate the relative ease with which active curation can be deployed more broadly.

2013

2012

We have developed DBpedia Spotlight, a flexible concept tagging system that is able to annotate entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting. We argue that a combination of techniques is needed, and we evaluate a number of alternatives according to an existing evaluation set.

2006

This paper introduces ongoing and current work within Internationalization (i18n) Activity, in the World Wide Web Consortium (W3C). The focus is on aspects of the W3C i18n Activity which are of benefit for the creation and manipulation of multilingual language resources. In particular, the paper deals with ongoing work concerning encoding, visualization and processing of characters; current work on language and locale identification; and current work on internationalization of markup. The main usage scenario is the design of multilingual corpora. This includes issues of corpus creation and manipulation.

2004

2002