Sungjae Lee


2024

pdf bib
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Jiwan Chung | Sungjae Lee | Minseo Kim | Seungju Han | Ashkan Yousefpour | Jack Hessel | Youngjae Yu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today’s AI capable of similar understanding?We present VisArgs, a dataset of 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion deduction.Experiments show that 1) machines struggle to capture visual cues: GPT-4-O achieved 78.5% accuracy, while humans reached 98.0%. Models also performed 19.5% worse when distinguishing between irrelevant objects within the image compared to external objects. 2) Providing relevant visual premises improved model performance significantly.

2021

pdf bib
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim | HyoungSeok Kim | Sang-Woo Lee | Gichang Lee | Donghyun Kwak | Jeon Dong Hyeon | Sunghyun Park | Sungju Kim | Seonhoon Kim | Dongpil Seo | Heungsub Lee | Minyoung Jeong | Sungjae Lee | Minsub Kim | Suk Hyun Ko | Seokhun Kim | Taeyong Park | Jinuk Kim | Soyoung Kang | Na-Hyeon Ryu | Kang Min Yoo | Minsuk Chang | Soobin Suh | Sookyo In | Jinseong Park | Kyungduk Kim | Hiun Kim | Jisu Jeong | Yong Goo Yeo | Donghoon Ham | Dongju Park | Min Young Lee | Jaewook Kang | Inho Kang | Jung-Woo Ha | Woomyoung Park | Nako Sung
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.