Eun-Sol Kim


pdf bib
Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Yu-Jung Heo | Eun-Sol Kim | Woo Suk Choi | Byoung-Tak Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself. Answering complex questions that require multi-hop reasoning under weak supervision is considered as a challenging problem since i) no supervision is given to the reasoning process and ii) high-order semantics of multi-hop knowledge facts need to be captured. In this paper, we introduce a concept of hypergraph to encode high-level semantics of a question and a knowledge base, and to learn high-order associations between them. The proposed model, Hypergraph Transformer, constructs a question hypergraph and a query-aware knowledge hypergraph, and infers an answer by encoding inter-associations between two hypergraphs and intra-associations in both hypergraph itself. Extensive experiments on two knowledge-based visual QA and two knowledge-based textual QA demonstrate the effectiveness of our method, especially for multi-hop reasoning problem. Our source code is available at


pdf bib
Semantic Alignment with Calibrated Similarity for Multilingual Sentence Embedding
Jiyeon Ham | Eun-Sol Kim
Findings of the Association for Computational Linguistics: EMNLP 2021

Measuring the similarity score between a pair of sentences in different languages is the essential requisite for multilingual sentence embedding methods. Predicting the similarity score consists of two sub-tasks, which are monolingual similarity evaluation and multilingual sentence retrieval. However, conventional methods have mainly tackled only one of the sub-tasks and therefore showed biased performances. In this paper, we suggest a novel and strong method for multilingual sentence embedding, which shows performance improvement on both sub-tasks, consequently resulting in robust predictions of multilingual similarity scores. The suggested method consists of two parts: to learn semantic similarity of sentences in the pivot language and then to extend the learned semantic structure to different languages. To align semantic structures across different languages, we introduce a teacher-student network. The teacher network distills the knowledge of the pivot language to different languages of the student network. During the distillation, the parameters of the teacher network are updated with the slow-moving average. Together with the distillation and the parameter updating, the semantic structure of the student network can be directly aligned across different languages while preserving the ability to measure the semantic similarity. Thus, the multilingual training method drives performance improvement on multilingual similarity evaluation. The suggested model achieves the state-of-the-art performance on extended STS 2017 multilingual similarity evaluation as well as two sub-tasks, which are extended STS 2017 monolingual similarity evaluation and Tatoeba multilingual retrieval in 14 languages.