2024
pdf
bib
abs
MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction
Jun-Hyung Park
|
Yeachan Kim
|
Mingyu Lee
|
Hyuntae Park
|
SangKeun Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences - textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
pdf
bib
abs
Moleco: Molecular Contrastive Learning with Chemical Language Models for Molecular Property Prediction
Jun-Hyung Park
|
Hyuntae Park
|
Yeachan Kim
|
Woosang Lim
|
SangKeun Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Pre-trained chemical language models (CLMs) excel in the field of molecular property prediction, utilizing string-based molecular descriptors such as SMILES for learning universal representations. However, such string-based descriptors implicitly contain limited structural information, which is closely associated with molecular property prediction. In this work, we introduce Moleco, a novel contrastive learning framework to enhance the understanding of molecular structures within CLMs. Based on the similarity of fingerprint vectors among different molecules, we train CLMs to distinguish structurally similar and dissimilar molecules in a contrastive manner. Experimental results demonstrate that Moleco significantly improves the molecular property prediction performance of CLMs, outperforming state-of-the-art models. Moreover, our in-depth analysis with diverse Moleco variants verifies that fingerprint vectors are highly effective features in improving CLMs’ understanding of the structural information of molecules.
pdf
bib
abs
Zero-shot Commonsense Reasoning over Machine Imagination
Hyuntae Park
|
Yeachan Kim
|
Jun-Hyung Park
|
SangKeun Lee
Findings of the Association for Computational Linguistics: EMNLP 2024
Recent approaches to zero-shot commonsense reasoning have enabled Pre-trained Language Models (PLMs) to learn a broad range of commonsense knowledge without being tailored to specific situations. However, they often suffer from human reporting bias inherent in textual commonsense knowledge, leading to discrepancies in understanding between PLMs and humans. In this work, we aim to bridge this gap by introducing an additional information channel to PLMs. We propose Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework designed to complement textual inputs with visual signals derived from machine-generated images. To achieve this, we enhance PLMs with imagination capabilities by incorporating an image generator into the reasoning process. To guide PLMs in effectively leveraging machine imagination, we create a synthetic pre-training dataset that simulates visual question-answering. Our extensive experiments on diverse reasoning benchmarks and analysis show that Imagine outperforms existing methods by a large margin, highlighting the strength of machine imagination in mitigating reporting bias and enhancing generalization capabilities.
2023
pdf
bib
abs
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
|
Hyuntae Park
|
Youjin Kang
|
Eojin Jeon
|
SangKeun Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Towards human-level visual understanding, visual commonsense generation has been introduced to generate commonsense inferences beyond images. However, current research on visual commonsense generation has overlooked an important human cognitive ability: generating descriptive and diverse inferences. In this work, we propose a novel visual commonsense generation framework, called DIVE, which aims to improve the descriptiveness and diversity of generated inferences. DIVE involves two methods, generic inference filtering and contrastive retrieval learning, which address the limitations of existing visual commonsense resources and training objectives. Experimental results verify that DIVE outperforms state-of-the-art models for visual commonsense generation in terms of both descriptiveness and diversity, while showing a superior quality in generating unique and novel inferences. Notably, DIVE achieves human-level descriptiveness and diversity on Visual Commonsense Graphs. Furthermore, human evaluations confirm that DIVE aligns closely with human judgments on descriptiveness and diversity.
2021
pdf
bib
abs
KOAS: Korean Text Offensiveness Analysis System
San-Hee Park
|
Kang-Min Kim
|
Seonhee Cho
|
Jun-Hyung Park
|
Hyuntae Park
|
Hyuna Kim
|
Seongwon Chung
|
SangKeun Lee
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Warning: This manuscript contains a certain level of offensive expression. As communication through social media platforms has grown immensely, the increasing prevalence of offensive language online has become a critical problem. Notably in Korea, one of the countries with the highest Internet usage, automatic detection of offensive expressions has recently been brought to attention. However, morphological richness and complex syntax of Korean causes difficulties in neural model training. Furthermore, most of previous studies mainly focus on the detection of abusive language, disregarding implicit offensiveness and underestimating a different degree of intensity. To tackle these problems, we present KOAS, a system that fully exploits both contextual and linguistic features and estimates an offensiveness score for a text. We carefully designed KOAS with a multi-task learning framework and constructed a Korean dataset for offensive analysis from various domains. Refer for a detailed demonstration.