Yao-Chung Fan

Also published as: Yao-chung Fan

2024

In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose the concept of retrieval augmented pretraining, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs and language models to further enhance the performance of DG. Our study unveils promising directions for further development in DG by showcasing the efficacy of knowledge augmentation and task-specific pretraining. These findings demonstrate the potential for leveraging both strategies to enhance the quality and performance of DG systems.

pdf bib abs
Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential
Chen-Jui Yu | Wen Hung Lee | Lin Tse Ke | Shih-Wei Guo | Yao-Chung Fan
Proceedings of the 17th International Natural Language Generation Conference

Creating high-quality True-False (TF) multiple-choice questions (MCQs), with accurate distractors, is a challenging and time-consuming task in education. This paper introduces True-False Distractor Generation (TFDG), a pipeline that leverages pre-trained language models and sentence retrieval techniques to automate the generation of TF-type MCQ distractors. Furthermore, the evaluation of generated TF questions presents a challenge. Traditional metrics like BLEU and ROUGE are unsuitable for this task. To address this, we propose a new evaluation metric called Retrieval-based Accuracy Differential (RAD). RAD assesses the discriminative power of TF questions by comparing model accuracy with and without access to reference texts. It quantitatively evaluates how well questions differentiate between students with varying knowledge levels. This research benefits educators and assessment developers, facilitating the efficient automatic generation of high-quality TF-type MCQs and their reliable evaluation.

pdf bib abs
Personalized Cloze Test Generation with Large Language Models: Streamlining MCQ Development and Enhancing Adaptive Learning
Chih-Hsuan Shen | Yi-Li Kuo | Yao-Chung Fan
Proceedings of the 17th International Natural Language Generation Conference

Cloze multiple-choice questions (MCQs) are essential for assessing comprehension in educational settings, but manually designing effective distractors is time-consuming. Addressing this, recent research has automated distractor generation, yet such methods often neglect to adjust the difficulty level to the learner’s abilities, resulting in non-personalized assessments. This study introduces the Personalized Cloze Test Generation (PCGL) Framework, utilizing Large Language Models (LLMs) to generate cloze tests tailored to individual proficiency levels. Our PCGL Framework simplifies test creation by generating both question stems and distractors from a single input word and adjusts the difficulty to match the learner’s proficiency. The framework significantly reduces the effort in creating tests and enhances personalized learning by dynamically adjusting to the needs of each learner.

pdf bib abs
Learning-From-Mistakes Prompting for Indigenous Language Translation
You Cheng Liao | Chen-Jui Yu | Chi-Yi Lin | He-Feng Yun | Yen-Hsiang Wang | Hsiao-Min Li | Yao-Chung Fan
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)

Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLM as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNN-Prompting with Retrieved Prompting Context, Chain-of-Thought Prompting, and Learning-from-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs, when paired with proper prompting, can effectively translate extremely low-resource languages.

pdf bib abs
NLPNCHU at SemEval-2024 Task 4: A Comparison of MDHC Strategy and In-domain Pre-training for Multilingual Detection of Persuasion Techniques in Memes
Shih-wei Guo | Yao-chung Fan
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study presents a systematic method for identifying 22 persuasive techniques used in multilingual memes. We explored various fine-tuning techniques and classification strategies, such as data augmentation, problem transformation, and hierarchical multi-label classification strategies. Identifying persuasive techniques in memes involves a multimodal task. We fine-tuned the XLM-RoBERTA-large-twitter language model, focusing on domain-specific language modeling, and integrated it with the CLIP visual model’s embedding to consider image and text features simultaneously. In our experiments, we evaluated the effectiveness of our approach by using official validation data in English. Our system in the competition, achieving competitive rankings in Subtask1 and Subtask2b across four languages: English, Bulgarian, North Macedonian, and Arabic. Significantly, we achieved 2nd place ranking for Arabic language in Subtask 1.

2023

In this paper, we address the task of cloze-style multiple choice question (MCQs) distractor generation. Our study is featured by the following designs. First, we propose to formulate the cloze distractor generation as a Text2Text task. Second, we propose pseudo Kullback-Leibler Divergence for regulating the generation to consider the item discrimination index in education evaluation. Third, we explore the candidate augmentation strategy and multi-tasking training with cloze-related tasks to further boost the generation performance. Through experiments with benchmarking datasets, our best perfomring model advances the state-of-the-art result from 10.81 to 22.00 (p@1 score).

2022

pdf bib abs
CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model
Shang-Hsuan Chiang | Ssu-Cheng Wang | Yao-Chung Fan
Findings of the Association for Computational Linguistics: EMNLP 2022

Manually designing cloze test consumes enormous time and efforts. The major challenge lies in wrong option (distractor) selection. Having carefully-design distractors improves the effectiveness of learner ability assessment. As a result, the idea of automatically generating cloze distractor is motivated. In this paper, we investigate cloze distractor generation by exploring the employment of pre-trained language models (PLMs) as an alternative for candidate distractor generation. Experiments show that the PLM-enhanced model brings a substantial performance improvement. Our best performing model advances the state-of-the-art result from 14.94 to 34.17 (NDCG@10 score). Our code and dataset is available at https://github.com/AndyChiangSH/CDGP.

pdf bib
Keyword Provision Question Generation for Facilitating Educational Reading Comprehension Preparation
Ying-Hong Chan | Ho-Lam Chung | Yao-Chung Fan
Proceedings of the 15th International Conference on Natural Language Generation

2020

pdf bib abs
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.
Ho-Lam Chung | Ying-Hong Chan | Yao-Chung Fan
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we investigate the following two limitations for the existing distractor generation (DG) methods. First, the quality of the existing DG methods are still far from practical use. There are still room for DG quality improvement. Second, the existing DG designs are mainly for single distractor generation. However, for practical MCQ preparation, multiple distractors are desired. Aiming at these goals, in this paper, we present a new distractor generation scheme with multi-tasking and negative answer training strategies for effectively generating multiple distractors. The experimental results show that (1) our model advances the state-of-the-art result from 28.65 to 39.81 (BLEU 1 score) and (2) the generated multiple distractors are diverse and shows strong distracting power for multiple choice question.

2019

pdf bib abs
A Recurrent BERT-based Model for Question Generation
Ying-Hong Chan | Yao-Chung Fan
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

In this study, we investigate the employment of the pre-trained BERT language model to tackle question generation tasks. We introduce three neural architectures built on top of BERT for question generation tasks. The first one is a straightforward BERT employment, which reveals the defects of directly using BERT for text generation. Accordingly, we propose another two models by restructuring our BERT employment into a sequential manner for taking information from previous decoded results. Our models are trained and evaluated on the recent question-answering dataset SQuAD. Experiment results show that our best model yields state-of-the-art performance which advances the BLEU 4 score of the existing best models from 16.85 to 22.17.

pdf bib abs
BERT for Question Generation
Ying-Hong Chan | Yao-Chung Fan
Proceedings of the 12th International Conference on Natural Language Generation

In this study, we investigate the employment of the pre-trained BERT language model to tackle question generation tasks. We introduce two neural architectures built on top of BERT for question generation tasks. The first one is a straightforward BERT employment, which reveals the defects of directly using BERT for text generation. And, the second one remedies the first one by restructuring the BERT employment into a sequential manner for taking information from previous decoded results. Our models are trained and evaluated on the question-answering dataset SQuAD. Experiment results show that our best model yields state-of-the-art performance which advances the BLEU4 score of existing best models from 16.85 to 18.91.

Co-authors

Venues

Fix data