Young-In Song


2023

pdf bib
Hierarchical Label Generation for Text Classification
Jingun Kwon | Hidetaka Kamigaito | Young-In Song | Manabu Okumura
Findings of the Association for Computational Linguistics: EACL 2023

2022

pdf bib
Pseudo-Relevance for Enhancing Document Representation
Jihyuk Kim | Seung-won Hwang | Seoho Song | Hyeseon Ko | Young-In Song
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

This paper studies how to enhance the document representation for the bi-encoder approach in dense document retrieval. The bi-encoder, separately encoding a query and a document as a single vector, is favored for high efficiency in large-scale information retrieval, compared to more effective but complex architectures. To combine the strength of the two, the multi-vector representation of documents for bi-encoder, such as ColBERT preserving all token embeddings, has been widely adopted. Our contribution is to reduce the size of the multi-vector representation, without compromising the effectiveness, supervised by query logs. Our proposed solution decreases the latency and the memory footprint, up to 8- and 3-fold, validated on MSMARCO and real-world search query logs.

2021

pdf bib
Query Generation for Multimodal Documents
Kyungho Kim | Kyungjae Lee | Seung-won Hwang | Young-In Song | Seungwook Lee
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper studies the problem of generatinglikely queries for multimodal documents withimages. Our application scenario is enablingefficient “first-stage retrieval” of relevant doc-uments, by attaching generated queries to doc-uments before indexing. We can then indexthis expanded text to efficiently narrow downto candidate matches using inverted index, sothat expensive reranking can follow. Our eval-uation results show that our proposed multi-modal representation meaningfully improvesrelevance ranking. More importantly, ourframework can achieve the state of the art inthe first stage retrieval scenarios

pdf bib
A New Surprise Measure for Extracting Interesting Relationships between Persons
Hidetaka Kamigaito | Jingun Kwon | Young-In Song | Manabu Okumura
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

One way to enhance user engagement in search engines is to suggest interesting facts to the user. Although relationships between persons are important as a target for text mining, there are few effective approaches for extracting the interesting relationships between persons. We therefore propose a method for extracting interesting relationships between persons from natural language texts by focusing on their surprisingness. Our method first extracts all personal relationships from dependency trees for the texts and then calculates surprise scores for distributed representations of the extracted relationships in an unsupervised manner. The unique point of our method is that it does not require any labeled dataset with annotation for the surprising personal relationships. The results of the human evaluation show that the proposed method could extract more interesting relationships between persons from Japanese Wikipedia articles than a popularity-based baseline method. We demonstrate our proposed method as a chrome plugin on google search.

pdf bib
An Uncertainty-Aware Encoder for Aspect Detection
Thi-Nhung Nguyen | Kiem-Hieu Nguyen | Young-In Song | Tuan-Dung Cao
Findings of the Association for Computational Linguistics: EMNLP 2021

Aspect detection is a fundamental task in opinion mining. Previous works use seed words either as priors of topic models, as anchors to guide the learning of aspects, or as features of aspect classifiers. This paper presents a novel weakly-supervised method to exploit seed words for aspect detection based on an encoder architecture. The encoder maps segments and aspects into a low-dimensional embedding space. The goal is approximating similarity between segments and aspects in the embedding space and their ground-truth similarity generated from seed words. An objective function is proposed to capture the uncertainty of ground-truth similarity. Our method outperforms previous works on several benchmarks in various domains.

2020

pdf bib
Hierarchical Trivia Fact Extraction from Wikipedia Articles
Jingun Kwon | Hidetaka Kamigaito | Young-In Song | Manabu Okumura
Proceedings of the 28th International Conference on Computational Linguistics

Recently, automatic trivia fact extraction has attracted much research interest. Modern search engines have begun to provide trivia facts as the information for entities because they can motivate more user engagement. In this paper, we propose a new unsupervised algorithm that automatically mines trivia facts for a given entity. Unlike previous studies, the proposed algorithm targets at a single Wikipedia article and leverages its hierarchical structure via top-down processing. Thus, the proposed algorithm offers two distinctive advantages: it does not incur high computation time, and it provides a domain-independent approach for extracting trivia facts. Experimental results demonstrate that the proposed algorithm is over 100 times faster than the existing method which considers Wikipedia categories. Human evaluation demonstrates that the proposed algorithm can mine better trivia facts regardless of the target entity domain and outperforms the existing methods.

2016

pdf bib
Opinion Retrieval Systems using Tweet-external Factors
Yoon-Sung Kim | Young-In Song | Hae-Chang Rim
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Opinion mining is a natural language processing technique which extracts subjective information from natural language text. To estimate an opinion about a query in large data collection, an opinion retrieval system that retrieves subjective and relevant information about the query can be useful. We present an opinion retrieval system that retrieves subjective and query-relevant tweets from Twitter, which is a useful source of obtaining real-time opinions. Our system outperforms previous opinion retrieval systems, and it further provides subjective information about Twitter authors and hashtags to describe their subjective tendencies.

2010

pdf bib
Comparable Entity Mining from Comparative Questions
Shasha Li | Chin-Yew Lin | Young-In Song | Zhoujun Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Mining Name Translations from Entity Graph Mapping
Gae-won You | Seung-won Hwang | Young-In Song | Long Jiang | Zaiqing Nie
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Word or Phrase? Learning Which Unit to Stress for Information Retrieval
Young-In Song | Jung-Tae Lee | Hae-Chang Rim
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Jung-Tae Lee | Sang-Bum Kim | Young-In Song | Hae-Chang Rim
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
K-QARD: A Practical Korean Question Answering Framework for Restricted Domain
Young-In Song | HooJung Chung | Kyoung-Soo Han | JooYoung Lee | Hae-Chang Rim | Jae-Won Lee
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

2004

pdf bib
A Practical QA System in Restricted Domains
Hoojung Chung | Young-In Song | Kyoung-Soo Han | Do-Sang Yoon | Joo-Young Lee | Hae-Chang Rim | Soo-Hong Kim
Proceedings of the Conference on Question Answering in Restricted Domains