Rizal Setya Perdana


2025

pdf bib
Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions
Sulthan Abiyyu Hakim | Rizal Setya Perdana | Tirana Noor Fatyanosa
Proceedings of the Second Workshop in South East Asian Language Processing

This study explores the ethical challenges faced by Indonesian Large Language Models (LLMs), particularly focusing on their ability to distinguish between ethical and unethical instructions. As LLMs become increasingly integrated into sensitive applications, ensuring their ethical operation is crucial. A key contribution of this study is the introduction of the Anak Baik dataset, a resource designed to enhance the ethical reasoning capabilities of Indonesian LLMs. The phrase “Anak Baik”, meaning “Good Boy”, symbolizes the ideal of ethical behavior, as a well-behaved child refrains from engaging in harmful actions. The dataset comprises instruction-response pairs in Indonesian, crafted for Supervised Fine-Tuning (SFT) tasks. It includes examples of both ethical and unethical responses to guide models in learning to generate responses that uphold moral standards. Leveraging Low-Rank Adaptation (LoRA) on models such as Komodo and Cendol shows a significant improvement in ethical decision-making processes. This enhanced performance is quantitatively validated through substantial increases in BLEU and ROUGE scores, indicating a stronger alignment with socially responsible behavior.