Jin-Dong Kim


2024

pdf bib
Pre-Gamus: Reducing Complexity of Scientific Literature as a Support against Misinformation
Nico Colic | Jin-Dong Kim | Fabio Rinaldi
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

Scientific literature encodes a wealth of knowledge relevant to various users. However, the complexity of scientific jargon makes it inaccessible to all but domain specialists. It would be helpful for different types of people to be able to get at least a gist of a paper. Biomedical practitioners often find it difficult to keep up with the information load; but even lay people would benefit from scientific information, for example to dispel medical misconceptions. Besides, in many countries, familiarity with English is limited, let alone scientific English, even among professionals. All this points to the need for simplified access to the scientific literature. We thus present an application aimed at solving this problem, which is capable of summarising scientific text in a way that is tailored to specific types of users, and in their native language. For this objective, we used an LLM that our system queries using user-selected parameters. We conducted an informal evaluation of this prototype using a questionnaire in 3 different languages.

2022

pdf bib
COVID-19 Mythbusters in World Languages
Mana Ashida | Jin-Dong Kim | Seunghun Lee
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper introduces a multi-lingual database containing translated texts of COVID-19 mythbusters. The database has translations into 115 languages as well as the original English texts, of which the original texts are published by World Health Organization (WHO). This paper then presents preliminary analyses on latin-alphabet-based texts to see the potential of the database as a resource for multilingual linguistic analyses. The analyses on latin-alphabet-based texts gave interesting insights into the resource. While the amount of translated texts in each language was small, character bi-grams with normalization (lowercasing and removal of diacritics) was turned out to be an effective proxy for measuring the similarity of the languages, and the affinity ranking of language pairs could be obtained. Additionally, the hierarchical clustering analysis is performed using the character bigram overlap ratio of every possible pair of languages. The result shows the cluster of Germanic languages, Romance languages, and Southern Bantu languages. In sum, the multilingual database not only offers fixed set of materials in numerous languages, but also serves as a preliminary tool to identify the language family using text-based similarity measure of bigram overlap ratio.

2020

pdf bib
Enhancing Quality of Corpus Annotation: Construction of the Multi-Layer Corpus Annotation and Simplified Validation of the Corpus Annotation
Youngbin Noh | Kuntae Kim | Minho Lee | Cheolhun Heo | Yongbin Jeong | Yoosung Jeong | Younggyun Hahm | Taehwan Oh | Hyonsu Choe | Seokwon Park | Jin-Dong Kim | Key-Sun Choi
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Towards Standardization of Web Service Protocols for NLPaaS
Jin-Dong Kim | Nancy Ide | Keith Suderman
Proceedings of the 1st International Workshop on Language Technology Platforms

Several web services for various natural language processing (NLP) tasks (‘‘NLP-as-a-service” or NLPaaS) have recently been made publicly available. However, despite their similar functionality these services often differ in the protocols they use, thus complicating the development of clients accessing them. A survey of currently available NLPaaS services suggests that it may be possible to identify a minimal application layer protocol that can be shared by NLPaaS services without sacrificing functionality or convenience, while at the same time simplifying the development of clients for these services. In this paper, we hope to raise awareness of the interoperability problems caused by the variety of existing web service protocols, and describe an effort to identify a set of best practices for NLPaaS protocol design. To that end, we survey and compare protocols used by NLPaaS services and suggest how these protocols may be further aligned to reduce variation.

2019

pdf bib
A Multi-Platform Annotation Ecosystem for Domain Adaptation
Richard Eckart de Castilho | Nancy Ide | Jin-Dong Kim | Jan-Christoph Klie | Keith Suderman
Proceedings of the 13th Linguistic Annotation Workshop

This paper describes an ecosystem consisting of three independent text annotation platforms. To demonstrate their ability to work in concert, we illustrate how to use them to address an interactive domain adaptation task in biomedical entity recognition. The platforms and the approach are in general domain-independent and can be readily applied to other areas of science.

2018

pdf bib
Mining Biomedical Publications With The LAPPS Grid
Nancy Ide | Keith Suderman | Jin-Dong Kim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Book Review: Biomedical Natural Language Processing by Kevin Bretonnel Cohen and Dina Demner-Fushman
Jin-Dong Kim
Computational Linguistics, Volume 43, Issue 1 - April 2017

2016

pdf bib
Proceedings of the 4th BioNLP Shared Task Workshop
Claire Nėdellec | Robert Bossy | Jin-Dong Kim
Proceedings of the 4th BioNLP Shared Task Workshop

pdf bib
Refactoring the Genia Event Extraction Shared Task Toward a General Framework for IE-Driven KB Development
Jin-Dong Kim | Yue Wang | Nicola Colic | Seung Han Beak | Yong Hwan Kim | Min Song
Proceedings of the 4th BioNLP Shared Task Workshop

pdf bib
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Key-Sun Choi | Christina Unger | Piek Vossen | Jin-Dong Kim | Noriko Kando | Axel-Cyrille Ngonga Ngomo
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

pdf bib
Evaluating a dictionary of human phenotype terms focusing on rare diseases
Simon Kocbek | Toyofumi Fujiwara | Jin-Dong Kim | Toshihisa Takagi | Tudor Groza
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

Annotating medical text such as clinical notes with human phenotype descriptors is an important task that can, for example, assist in building patient profiles. To automatically annotate text one usually needs a dictionary of predefined terms. However, do to the variety of human expressiveness, current state-of-the art phenotype concept recognizers and automatic annotators struggle with specific domain issues and challenges. In this paper we present results of an-notating gold standard corpus with a dictionary containing lexical variants for the Human Phenotype Ontology terms. The main purpose of the dictionary is to improve the recall of phenotype concept recognition systems. We compare the method with four other approaches and present results.

2013

pdf bib
Proceedings of the BioNLP Shared Task 2013 Workshop
Claire Nédellec | Robert Bossy | Jin-Dong Kim | Jung-jae Kim | Tomoko Ohta | Sampo Pyysalo | Pierre Zweigenbaum
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Overview of BioNLP Shared Task 2013
Claire Nédellec | Robert Bossy | Jin-Dong Kim | Jung-jae Kim | Tomoko Ohta | Sampo Pyysalo | Pierre Zweigenbaum
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
The Genia Event Extraction Shared Task, 2013 Edition - Overview
Jin-Dong Kim | Yue Wang | Yamamoto Yasunori
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Evaluation of SPARQL query generation from natural language questions
K. Bretonnel Cohen | Jin-Dong Kim
Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction

pdf bib
A Generalized LCS Algorithm and Its Application to Corpus Alignment
Jin-Dong Kim
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
New Resources and Perspectives for Biomedical Event Extraction
Sampo Pyysalo | Pontus Stenetorp | Tomoko Ohta | Jin-Dong Kim | Sophia Ananiadou
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
PubAnnotation - a persistent and sharable corpus and annotation repository
Jin-Dong Kim | Yue Wang
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
Boosting the protein name recognition performance by bootstrapping on selected text
Yue Wang | Jin-Dong Kim
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
CSAF - a community-sourcing annotation framework
Jin-Dong Kim | Yue Wang
Proceedings of the Sixth Linguistic Annotation Workshop

2011

pdf bib
Parsing Natural Language Queries for Life Science Knowledge
Tadayoshi Hara | Yuka Tateisi | Jin-Dong Kim | Yusuke Miyao
Proceedings of BioNLP 2011 Workshop

pdf bib
Proceedings of BioNLP Shared Task 2011 Workshop
Jun’ichi Tsujii | Jin-Dong Kim | Sampo Pyysalo
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Overview of BioNLP Shared Task 2011
Jin-Dong Kim | Sampo Pyysalo | Tomoko Ohta | Robert Bossy | Ngan Nguyen | Jun’ichi Tsujii
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Overview of Genia Event Task in BioNLP Shared Task 2011
Jin-Dong Kim | Yue Wang | Toshihisa Takagi | Akinori Yonezawa
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Overview of BioNLP 2011 Protein Coreference Shared Task
Ngan Nguyen | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
BioNLP Shared Task 2011: Supporting Resources
Pontus Stenetorp | Goran Topić | Sampo Pyysalo | Tomoko Ohta | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf bib
Event Extraction for Post-Translational Modifications
Tomoko Ohta | Sampo Pyysalo | Makoto Miwa | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

2009

pdf bib
GuideLink: A Corpus Annotation System that Integrates the Management of Annotation Guidelines
Kenta Oouchida | Jin-Dong Kim | Toshihisa Takagi | Jun’ichi Tsujii
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Static Relations: a Piece in the Biomedical Information Extraction Puzzle
Sampo Pyysalo | Tomoko Ohta | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of the BioNLP 2009 Workshop

pdf bib
Incorporating GENETAG-style annotation to GENIA corpus
Tomoko Ohta | Jin-Dong Kim | Sampo Pyysalo | Yue Wang | Jun’ichi Tsujii
Proceedings of the BioNLP 2009 Workshop

pdf bib
Bridging the Gap between Domain-Oriented and Linguistically-Oriented Semantics
Sumire Uematsu | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of the BioNLP 2009 Workshop

pdf bib
Overview of BioNLP’09 Shared Task on Event Extraction
Jin-Dong Kim | Tomoko Ohta | Sampo Pyysalo | Yoshinobu Kano | Jun’ichi Tsujii
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib
Prediction of Protein Sub-cellular Localization using Information from Texts and Sequences.
Hong-Woo Chun | Chisato Yamasaki | Naomi Saichi | Masayuki Tanaka | Teruyoshi Hishiki | Tadashi Imanishi | Takashi Gojobori | Jin-Dong Kim | Jun’ichi Tsujii | Toshihisa Takagi
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Raising the Compatibility of Heterogeneous Annotations: A Case Study on
Yue Wang | Kazuhiro Yoshida | Jin-Dong Kim | Rune Saetre | Jun’ichi Tsujii
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Challenges in Pronoun Resolution System for Biomedical Text
Ngan Nguyen | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents our findings on the feasibility of doing pronoun resolution for biomedical texts, in comparison with conducting pronoun resolution for the newswire domain. In our experiments, we built a simple machine learning-based pronoun resolution system, and evaluated the system on three different corpora: MUC, ACE, and GENIA. Comparative statistics not only reveal the noticeable issues in constructing an effective pronoun resolution system for a new domain, but also provides a comprehensive view of those corpora often used for this task.

pdf bib
Exploring Domain Differences for the Design of a Pronoun Resolution System for Biomedical Text
Ngan L.T. Nguyen | Jin-Dong Kim
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2006

pdf bib
Linguistic and Biological Annotations of Biological Interaction Events
Tomoko Ohta | Yuka Tateisi | Jin-Dong Kim | Akane Yakushiji | Jun-ichi Tsujii
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper discusses an augmentation of a corpus ofresearch abstracts in biomedical domain (the GENIA corpus) with two kinds of annotations: tree annotation and event annotation. The tree annotation identifies the linguistic structure that encodes the relations among entities. The event annotation reveals the semantic structure of the biological interaction events encoded in the text. With these annotations we aim to provide a link between the clue and the target of biological event information extraction.

pdf bib
An Intelligent Search Engine and GUI-based Efficient MEDLINE Search Tool Based on Deep Syntactic Parsing
Tomoko Ohta | Yusuke Miyao | Takashi Ninomiya | Yoshimasa Tsuruoka | Akane Yakushiji | Katsuya Masuda | Jumpei Takeuchi | Kazuhiro Yoshida | Tadayoshi Hara | Jin-Dong Kim | Yuka Tateisi | Jun’ichi Tsujii
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

2004

pdf bib
Introduction to the Bio-entity Recognition Task at JNLPBA
Nigel Collier | Tomoko Ohta | Yoshimasa Tsuruoka | Yuka Tateisi | Jin-Dong Kim
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

2003

pdf bib
Self-Organizing Markov Models and Their Application to Part-of-Speech Tagging
Jin-Dong Kim | Hae-Chang Rim | Jun’ichi Tsujii
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Encoding Biomedical Resources in TEI: The Case of the GENIA Corpus
Tomaz Erjavec | Jin-Dong Kim | Tomoko Ohta | Yuka Tateisi | Jun’ichi Tsujii
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine

pdf bib
Stretching TEI: Converting the Genia Corpus
Tomaz Erjavec | Jin-Dong Kim | Tomoko Ohta | Yuka Tateisi | Jun-ichi Tsujii
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

2000

pdf bib
KCAT: A Korean Corpus Annotating Tool Minimizing Human Intervention
Won-He Ryu | Jin-Dong Kim | Hae-Chang Rim | Heui-Seok Lim
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf bib
HMM Specialization with Selective Lexicalization
Jin-Dong Kim | Sang-Zoo Lee | Hae-Chang Rim
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora