Yen-Hsiang Wang
2024
Learning-From-Mistakes Prompting for Indigenous Language Translation
You Cheng Liao
|
Chen-Jui Yu
|
Chi-Yi Lin
|
He-Feng Yun
|
Yen-Hsiang Wang
|
Hsiao-Min Li
|
Yao-Chung Fan
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLM as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNN-Prompting with Retrieved Prompting Context, Chain-of-Thought Prompting, and Learning-from-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs, when paired with proper prompting, can effectively translate extremely low-resource languages.
2023
Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation
Xingdi Yuan
|
Tong Wang
|
Yen-Hsiang Wang
|
Emery Fine
|
Rania Abdelghani
|
Hélène Sauzéon
|
Pierre-Yves Oudeyer
Findings of the Association for Computational Linguistics: ACL 2023
Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, partly due to the inaccessibility of LLMs, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-based approaches, namely round-trip and prompt-based score, to selecting high-quality questions from a set of LLM-generated candidates. Our method works without the need to modify the underlying model, nor does it rely on human-annotated references — both of which are realistic constraints for real-world deployment of LLMs. With automatic as well as human evaluations, we empirically demonstrate that our approach can effectively select questions of higher qualities than greedy generation.
Search
Fix data
Co-authors
- Rania Abdelghani 1
- Yao-Chung Fan 1
- Emery Fine 1
- Hsiao-Min Li 1
- You Cheng Liao 1
- show all...