Hongjin Kim
2024
Title-based Extractive Summarization via MRC Framework
Hongjin Kim
|
Jai-Eun Kim
|
Harksoo Kim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Existing studies on extractive summarization have primarily focused on scoring and selecting summary sentences independently. However, these models are limited to sentence-level extraction and tend to select highly generalized sentences while overlooking the overall content of a document. To effectively consider the semantics of a document, in this study, we introduce a novel machine reading comprehension (MRC) framework for extractive summarization (MRCSum) by setting a query as the title. Our framework enables MRCSum to consider the semantic coherence and relevance of summary sentences in relation to the overall content. In particular, when a title is not available, we generate a title-like query, which is expected to achieve the same effect as a title. Our title-like query consists of the topic and keywords to serve as information on the main topic or theme of the document. We conduct experiments in both Korean and English languages, evaluating the performance of MRCSum on datasets comprising both long and short summaries. Our results demonstrate the effectiveness of MRCSum in extractive summarization, showcasing its ability to generate concise and informative summaries with or without explicit titles. Furthermore, our MRCSum outperforms existing models by capturing the essence of the document content and producing more coherent summaries.
2023
A Framework for Vision-Language Warm-up Tasks in Multimodal Dialogue Models
Jaewook Lee
|
Seongsik Park
|
Seong-Heum Park
|
Hongjin Kim
|
Harksoo Kim
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Most research on multimodal open-domain dialogue agents has focused on pretraining and multi-task learning using additional rich datasets beyond a given target dataset. However, methods for exploiting these additional datasets can be quite limited in real-world settings, creating a need for more efficient methods for constructing agents based solely on the target dataset. To address these issues, we present a new learning strategy called vision-language warm-up tasks for multimodal dialogue models (VLAW-MDM). This strategy does not require the use of large pretraining or multi-task datasets but rather relies solely on learning from target data. Moreover, our proposed approach automatically generate captions for images and incorporate them into the model’s input to improve the contextualization of visual information. Using this novel approach, we empirically demonstrate that our learning strategy is effective for limited data and relatively small models. The result show that our method achieved comparable and in some cases superior performance compared to existing state-of-the-art models on various evaluation metrics.
2021
The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference
Hongjin Kim
|
Damrin Kim
|
Harksoo Kim
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Search
Co-authors
- Harksoo Kim 3
- Jaewook Lee 1
- Seongsik Park 1
- Seong-Heum Park 1
- Damrin Kim 1
- show all...