Geonsik Moon


2024

pdf bib
Are Decoder-Only Language Models Better than Encoder-Only Language Models in Understanding Word Meaning?
Muhammad Qorib | Geonsik Moon | Hwee Tou Ng
Findings of the Association for Computational Linguistics: ACL 2024

The natural language processing field has been evolving around language models for the past few years, from the usage of n-gram language models for re-ranking, to transfer learning with encoder-only (BERT-like) language models, and finally to large language models (LLMs) as general solvers. LLMs are dominated by the decoder-only type, and they are popular for their efficacy in numerous tasks. LLMs are regarded as having strong comprehension abilities and strong capabilities to solve new unseen tasks. As such, people may quickly assume that decoder-only LLMs always perform better than the encoder-only ones, especially for understanding word meaning. In this paper, we demonstrate that decoder-only LLMs perform worse on word meaning comprehension than an encoder-only language model that has vastly fewer parameters.

pdf bib
From Moments to Milestones: Incremental Timeline Summarization Leveraging Large Language Models
Qisheng Hu | Geonsik Moon | Hwee Tou Ng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Timeline summarization (TLS) is essential for distilling coherent narratives from a vast collection of texts, tracing the progression of events and topics over time. Prior research typically focuses on either event or topic timeline summarization, neglecting the potential synergy of these two forms. In this study, we bridge this gap by introducing a novel approach that leverages large language models (LLMs) for generating both event and topic timelines. Our approach diverges from conventional TLS by prioritizing event detection, leveraging LLMs as pseudo-oracles for incremental event clustering and the construction of timelines from a text stream. As a result, it produces a more interpretable pipeline. Empirical evaluation across four TLS benchmarks reveals that our approach outperforms the best prior published approaches, highlighting the potential of LLMs in timeline summarization for real-world applications.

2023

pdf bib
ALLECS: A Lightweight Language Error Correction System
Muhammad Reza Qorib | Geonsik Moon | Hwee Tou Ng
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

In this paper, we present ALLECS, a lightweight web application to serve grammatical error correction (GEC) systems so that they can be easily used by the general public. We design ALLECS to be accessible to as many users as possible, including users who have a slow Internet connection and who use mobile phones as their main devices to connect to the Internet. ALLECS provides three state-of-the-art base GEC systems using two approaches (sequence-to-sequence generation and sequence tagging), as well as two state-of-the-art GEC system combination methods using two approaches (edit-based and text-based). ALLECS can be accessed at https://sterling8.d2.comp.nus.edu.sg/gec-demo/

pdf bib
WAMP: Writing, Annotation, and Marking Platform
Geonsik Moon | Muhammad Reza Qorib | Daniel Dahlmeier | Hwee Tou Ng
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations