Ji Young Lee

Also published as: Ji-Young Lee


2018

pdf bib
Transfer Learning for Named-Entity Recognition with Neural Networks
Ji Young Lee | Franck Dernoncourt | Peter Szolovits
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts
Franck Dernoncourt | Ji Young Lee
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We present PubMed 200k RCT, a new dataset based on PubMed for sequential sentence classification. The dataset consists of approximately 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract using one of the following classes: background, objective, method, result, or conclusion. The purpose of releasing this dataset is twofold. First, the majority of datasets for sequential short-text classification (i.e., classification of short texts that appear in sequences) are small: we hope that releasing a new large dataset will help develop more accurate algorithms for this task. Second, from an application perspective, researchers need better tools to efficiently skim through the literature. Automatically classifying each sentence in an abstract would help researchers read abstracts more efficiently, especially in fields where abstracts may be long, such as the medical field.

pdf bib
NeuroNER: an easy-to-use program for named-entity recognition based on neural networks
Franck Dernoncourt | Ji Young Lee | Peter Szolovits
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this paper, we present NeuroNER, an easy-to-use named-entity recognition tool based on ANNs. Users can annotate entities using a graphical web-based user interface (BRAT): the annotations are then used to train an ANN, which in turn predict entities’ locations and categories in new texts. NeuroNER makes this annotation-training-prediction flow smooth and accessible to anyone.

pdf bib
Neural Networks for Joint Sentence Classification in Medical Paper Abstracts
Franck Dernoncourt | Ji Young Lee | Peter Szolovits
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Existing models based on artificial neural networks (ANNs) for sentence classification often do not incorporate the context in which sentences appear, and classify sentences individually. However, traditional sentence classification approaches have been shown to greatly benefit from jointly classifying subsequent sentences, such as with conditional random fields. In this work, we present an ANN architecture that combines the effectiveness of typical ANN models to classify sentences in isolation, with the strength of structured prediction. Our model outperforms the state-of-the-art results on two different datasets for sequential sentence classification in medical abstracts.

pdf bib
MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks
Ji Young Lee | Franck Dernoncourt | Peter Szolovits
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Over 50 million scholarly articles have been published: they constitute a unique repository of knowledge. In particular, one may infer from them relations between scientific concepts. Artificial neural networks have recently been explored for relation extraction. In this work, we continue this line of work and present a system based on a convolutional neural network to extract relations. Our model ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C).

2016

pdf bib
Feature-Augmented Neural Networks for Patient Note De-identification
Ji Young Lee | Franck Dernoncourt | Özlem Uzuner | Peter Szolovits
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Patient notes contain a wealth of information of potentially great interest to medical investigators. However, to protect patients’ privacy, Protected Health Information (PHI) must be removed from the patient notes before they can be legally released, a process known as patient note de-identification. The main objective for a de-identification system is to have the highest possible recall. Recently, the first neural-network-based de-identification system has been proposed, yielding state-of-the-art results. Unlike other systems, it does not rely on human-engineered features, which allows it to be quickly deployed, but does not leverage knowledge from human experts or from electronic health records (EHRs). In this work, we explore a method to incorporate human-engineered features as well as features derived from EHRs to a neural-network-based de-identification system. Our results show that the addition of features, especially the EHR-derived features, further improves the state-of-the-art in patient note de-identification, including for some of the most sensitive PHI types such as patient names. Since in a real-life setting patient notes typically come with EHRs, we recommend developers of de-identification systems to leverage the information EHRs contain.

pdf bib
Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks
Ji Young Lee | Franck Dernoncourt
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2001

pdf bib
A test suite for evaluation of English-to-Korean machine translation systems
Sungryong Koh | Jinee Maeng | Ji-Young Lee | Young-Sook Chae | Key-Sun Choi
Proceedings of Machine Translation Summit VIII

This paper describes KORTERM’s test suite and their practicability. The test-sets have been being constructed on the basis of fine-grained classification of linguistic phenomena to evaluate the technical status of English-to-Korean MT systems systematically. They consist of about 5000 test-sets and are growing. Each test-set contains an English sentence, a model Korean translation, a linguistic phenomenon category, and a yes/no question about the linguistic phenomenon. Two commercial systems were evaluated with a yes/no test of prepared questions. Total accuracy rates of the two systems were different (50% vs. 66%). In addition, a comprehension test was carried out. We found that one system was more comprehensible than the other system. These results seem to show that our test suite is practicable.