Te-Lun Yang


2025

pdf bib
A Preliminary Study of RAG for Taiwanese Historical Archives
Claire Lin | Bo-Han Feng | Xuanjun Chen | Te-Lun Yang | Hung-Yi Lee | Jyh-Shing Roger Jang
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.

2023

pdf bib
Category Mapping for Zero-shot Text Classification
Qiu-Xia Zhang | Te-Yu Chi | Te-Lun Yang | Yu-Meng Tang | Ta-Lin Chen | Jyh-Shing Roger Jang
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

pdf bib
CrowNER at ROCLING 2023 MultiNER-Health Task: Enhancing NER Task with GPT Paraphrase Augmentation on Sparsely Labeled Data
Yin-Chieh Wang | Wen-Hong Wu | Feng-Yu Kuo | Han-Chun Wu | Te-Yu Chi | Te-Lun Yang | Sheh Chen | Jyh-Shing Roger Jang
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

pdf bib
CrowNER at Rocling 2022 Shared Task: NER using MacBERT and Adversarial Training
Qiu-Xia Zhang | Te-Yu Chi | Te-Lun Yang | Jyh-Shing Roger Jang
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

This study uses training and validation data from the “ROCLING 2022 Chinese Health Care Named Entity Recognition Task” for modeling. The modeling process adopts technologies such as data augmentation and data post-processing, and uses the MacBERT pre-training model to build a dedicated Chinese medical field NER recognizer. During the fine-tuning process, we also added adversarial training methods, such as FGM and PGD, and the results of the final tuned model were close to the best team for task evaluation. In addition, by introducing mixed-precision training, we also greatly reduce the time cost of training.