Changhoe Hwang
2022
Building Korean Linguistic Resource for NLU Data Generation of Banking App CS Dialog System
Jeongwoo Yoon
|
Onyu Park
|
Changhoe Hwang
|
Gwanghoon Yoo
|
Eric Laporte
|
Jeesun Nam
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
Natural language understanding (NLU) is integral to task-oriented dialog systems, but demands a considerable amount of annotated training data to increase the coverage of diverse utterances. In this study, we report the construction of a linguistic resource named FIAD (Financial Annotated Dataset) and its use to generate a Korean annotated training data for NLU in the banking customer service (CS) domain. By an empirical examination of a corpus of banking app reviews, we identified three linguistic patterns occurring in Korean request utterances: TOPIC (ENTITY, FEATURE), EVENT, and DISCOURSE MARKER. We represented them in LGGs (Local Grammar Graphs) to generate annotated data covering diverse intents and entities. To assess the practicality of the resource, we evaluate the performances of DIET-only (Intent: 0.91 /Topic [entity+feature]: 0.83), DIET+ HANBERT (I:0.94/T:0.85), DIET+ KoBERT (I:0.94/T:0.86), and DIET+ KorBERT (I:0.95/T:0.84) models trained on FIAD-generated data to extract various types of semantic items.
SSP-Based Construction of Evaluation-Annotated Data for Fine-Grained Aspect-Based Sentiment Analysis
Suwon Choi
|
Shinwoo Kim
|
Changhoe Hwang
|
Gwanghoon Yoo
|
Eric Laporte
|
Jeesun Nam
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
We report the construction of a Korean evaluation-annotated corpus, hereafter called ‘Evaluation Annotated Dataset (EVAD)’, and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentiment linguistic patterns. The annotation process uses Semi-Automatic Symbolic Propagation (SSP). We built extensive linguistic resources formalized as a Finite-State Transducer (FST) to annotate corpora with detailed ABSA components in the fashion e-commerce domain. The ABSA approach is extended, in order to analyze user opinions more accurately and extract more detailed features of targets, by including aspect values in addition to topics and aspects, and by classifying aspect-value pairs depending whether values are unary, binary, or multiple. For evaluation, the KoBERT and KcBERT models are trained on the annotated dataset, showing robust performances of F1 0.88 and F1 0.90, respectively, on recognition of aspect-value pairs.
Search
Fix data
Co-authors
- Eric Laporte 2
- Jeesun Nam 2
- Gwanghoon Yoo 2
- Suwon Choi 1
- Shinwoo Kim 1
- show all...