Kazushi Ohya


2022

pdf bib
An Architecture of resolving a multiple link path in a standoff-style data format to enhance the mobility of language resources
Kazushi Ohya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The present data formats proposed by authentic organizations are based on a so-called standoff-style data format in XML, which represents a semantic data model through an instance structure and a link structure. However, this type of data formats intended to enhance the power of representation of an XML format injures the mobility of data because an abstract data structure denoted by multiple link paths is hard to be converted into other data structures. This difficulty causes a problem in the reuse of data to convert into other data formats especially in a personal data management environment. In this paper, in order to compensate for the drawback, we propose a new concept of transforming a link structure to an instance structure on a new marked-up scheme. This approach to language data brings a new architecture of language data management to realize a personal data management environment in daily and long-life use.

2016

pdf bib
Data Formats and Management Strategies from the Perspective of Language Resource Producers ― Personal Diachronic and Social Synchronic Data Sharing ―
Kazushi Ohya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This is a report of findings from on-going language documentation research based on three consecutive projects from 2008 to 2016. In the light of this research, we propose that (1) we should stand on the side of language resource producers to enhance the research of language processing. We support personal data management in addition to social data sharing. (2) This support leads to adopting simple data formats instead of the multi-link-path data models proposed as international standards up to the present. (3) We should set up a framework for total language resource study that includes not only pivotal data formats such as standard formats, but also the surroundings of data formation to capture a wider range of language activities, e.g. annotation, hesitant language formation, and reference-referent relations. A study of this framework is expected to be a foundation of rebuilding man-machine interface studies in which we seek to observe generative processes of informational symbols in order to establish a high affinity interface in regard to documentation.
Search
Co-authors
    Venues