Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions

Sulthan Abiyyu Hakim, Rizal Setya Perdana, Tirana Noor Fatyanosa


Abstract
This study explores the ethical challenges faced by Indonesian Large Language Models (LLMs), particularly focusing on their ability to distinguish between ethical and unethical instructions. As LLMs become increasingly integrated into sensitive applications, ensuring their ethical operation is crucial. A key contribution of this study is the introduction of the Anak Baik dataset, a resource designed to enhance the ethical reasoning capabilities of Indonesian LLMs. The phrase “Anak Baik”, meaning “Good Boy”, symbolizes the ideal of ethical behavior, as a well-behaved child refrains from engaging in harmful actions. The dataset comprises instruction-response pairs in Indonesian, crafted for Supervised Fine-Tuning (SFT) tasks. It includes examples of both ethical and unethical responses to guide models in learning to generate responses that uphold moral standards. Leveraging Low-Rank Adaptation (LoRA) on models such as Komodo and Cendol shows a significant improvement in ethical decision-making processes. This enhanced performance is quantitatively validated through substantial increases in BLEU and ROUGE scores, indicating a stronger alignment with socially responsible behavior.
Anthology ID:
2025.sealp-1.5
Volume:
Proceedings of the Second Workshop in South East Asian Language Processing
Month:
January
Year:
2025
Address:
Online
Editors:
Derry Wijaya, Alham Fikri Aji, Clara Vania, Genta Indra Winata, Ayu Purwarianti
Venues:
sealp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–62
Language:
URL:
https://aclanthology.org/2025.sealp-1.5/
DOI:
Bibkey:
Cite (ACL):
Sulthan Abiyyu Hakim, Rizal Setya Perdana, and Tirana Noor Fatyanosa. 2025. Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions. In Proceedings of the Second Workshop in South East Asian Language Processing, pages 52–62, Online. Association for Computational Linguistics.
Cite (Informal):
Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions (Hakim et al., sealp 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.sealp-1.5.pdf