Na-Rae Han


2020

pdf bib
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean
Tae Hwan Oh | Ji Yoon Han | Hyonsu Choe | Seokwon Park | Han He | Jinho D. Choi | Na-Rae Han | Jena D. Hwang | Hansaem Kim
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency relations to reflect morphological features and flexible word- order aspects in Korean. The original and the revised versions of PKT-UD are experimented with transformer-based parsing models using biaffine attention. The parsing model trained on the revised corpus shows a significant improvement of 3.0% in labeled attachment score over the model trained on the previous corpus. Our error analysis demonstrates that this revision allows the parsing model to learn relations more robustly, reducing several critical errors that used to be made by the previous model.

pdf bib
K-SNACS: Annotating Korean Adposition Semantics
Jena D. Hwang | Hanwool Choe | Na-Rae Han | Nathan Schneider
Proceedings of the Second International Workshop on Designing Meaning Representations

While many languages use adpositions to encode semantic relationships between content words in a sentence (e.g., agentivity or temporality), the details of how adpositions work vary widely across languages with respect to both form and meaning. In this paper, we empirically adapt the SNACS framework (Schneider et al., 2018) to Korean, a language that is typologically distant from English—the language SNACS was based on. We apply the SNACS framework to annotate the highly popular novellaThe Little Prince with semantic supersense labels over allKorean postpositions. Thus, we introduce the first broad-coverage corpus annotated with Korean postposition semantics and provide a detailed analysis of the corpus with an apples-to-apples comparison between Korean and English annotations

2019

pdf bib
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Sudipta Kar | Farah Nadeem | Laura Burdick | Greg Durrett | Na-Rae Han
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2018

pdf bib
Building Universal Dependency Treebanks in Korean
Jayeol Chun | Na-Rae Han | Jena D. Hwang | Jinho D. Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Parser combinators for Tigrinya and Oromo morphology
Patrick Littell | Tom McCoy | Na-Rae Han | Shruti Rijhwani | Zaid Sheikh | David Mortensen | Teruko Mitamura | Lori Levin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Coordinate Structures in Universal Dependencies for Head-final Languages
Hiroshi Kanayama | Na-Rae Han | Masayuki Asahara | Jena D. Hwang | Yusuke Miyao | Jinho D. Choi | Yuji Matsumoto
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This paper discusses the representation of coordinate structures in the Universal Dependencies framework for two head-final languages, Japanese and Korean. UD applies a strict principle that makes the head of coordination the left-most conjunct. However, the guideline may produce syntactic trees which are difficult to accept in head-final languages. This paper describes the status in the current Japanese and Korean corpora and proposes alternative designs suitable for these languages.

2017

pdf bib
Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions
Jena D. Hwang | Archna Bhatia | Na-Rae Han | Tim O’Gorman | Vivek Srikumar | Nathan Schneider
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition’s lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition’s lexical function so they can be annotated at scale—supporting automatic, statistical processing of domain-general language—and discuss how this representation would allow for a simpler inventory of labels.

2010

pdf bib
Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System
Na-Rae Han | Joel Tetreault | Soo-Hwa Lee | Jin-Young Ha
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays written by ESL learners, relying on contextual and grammatical features surrounding preposition usage. First, we show that such a model can achieve high performance values: 93.3% precision and 14.8% recall for error detection and 81.7% precision and 13.2% recall for error detection and correction when tested on preposition replacement errors. Second, we show that this model outperforms models trained on well-edited text produced by native speakers of English. We discuss the implications of our approach in the area of language error modeling and the issues stemming from working with a noisy data set whose error annotations are not exhaustive.

2007

pdf bib
Detection of Grammatical Errors Involving Prepositions
Martin Chodorow | Joel Tetreault | Na-Rae Han
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions

2004

pdf bib
Korean Null Pronouns: Classification and Annotation
Na-Rae Han
Proceedings of the Workshop on Discourse Annotation

pdf bib
Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus
Na-Rae Han | Martin Chodorow | Claudia Leacock
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Development and Evaluation of a Korean Treebank and its Application to NLP
Chung-hye Han | Na-Rae Han | Eon-Suk Ko | Martha Palmer
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Penn Korean Treebank : Development and Evaluation
Chung-hye Han | Na-Rae Han | Eon-Suk Ko | Martha Palmer | Heejong Yi
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation