Fariz Ikhwantri


2025

pdf bib
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata | Frederikus Hudi | Patrick Amadeus Irawan | David Anugraha | Rifki Afina Putri | Wang Yutong | Adam Nohejl | Ubaidillah Ariq Prathama | Nedjma Ousidhoum | Afifa Amriani | Anar Rzayev | Anirban Das | Ashmari Pramodya | Aulia Adila | Bryan Wilie | Candy Olivia Mawalim | Cheng Ching Lam | Daud Abolade | Emmanuele Chersoni | Enrico Santus | Fariz Ikhwantri | Garry Kuwanto | Hanyang Zhao | Haryo Akbarianto Wibowo | Holy Lovenia | Jan Christian Blaise Cruz | Jan Wira Gotama Putra | Junho Myung | Lucky Susanto | Maria Angelica Riera Machin | Marina Zhukova | Michael Anugraha | Muhammad Farid Adilazuarda | Natasha Christabelle Santosa | Peerat Limkonchotiwat | Raj Dabre | Rio Alexander Audino | Samuel Cahyawijaya | Shi-Xiong Zhang | Stephanie Yulia Salim | Yi Zhou | Yinxuan Gui | David Ifeoluwa Adelani | En-Shiun Annie Lee | Shogo Okada | Ayu Purwarianti | Alham Fikri Aji | Taro Watanabe | Derry Tanti Wijaya | Alice Oh | Chong-Wah Ngo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.

2024

pdf bib
Analyzing Interpretability of Summarization Model with Eye-gaze Information
Fariz Ikhwantri | Hiroaki Yamada | Takenobu Tokunaga
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Interpretation methods provide saliency scores indicating the importance of input words for neural summarization models. Prior work has analyzed models by comparing them to human behavior, often using eye-gaze as a proxy for human attention in reading tasks such as classification. This paper presents a framework to analyze the model behavior in summarization by comparing it to human summarization behavior using eye-gaze data. We examine two research questions: RQ1) whether model saliency conforms to human gaze during summarization and RQ2) how model saliency and human gaze affect summarization performance. For RQ1, we measure conformity by calculating the correlation between model saliency and human fixation counts. For RQ2, we conduct ablation experiments removing words/sentences considered important by models or humans. Experiments on two datasets with human eye-gaze during summarization partially confirm that model saliency aligns with human gaze (RQ1). However, ablation experiments show that removing highly-attended words/sentences from the human gaze does not significantly degrade performance compared with the removal by the model saliency (RQ2).

2018

pdf bib
Multi-Task Active Learning for Neural Semantic Role Labeling on Low Resource Conversational Corpus
Fariz Ikhwantri | Samuel Louvan | Kemal Kurniawan | Bagas Abisena | Valdi Rachman | Alfan Farizki Wicaksono | Rahmad Mahendra
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Most Semantic Role Labeling (SRL) approaches are supervised methods which require a significant amount of annotated corpus, and the annotation requires linguistic expertise. In this paper, we propose a Multi-Task Active Learning framework for Semantic Role Labeling with Entity Recognition (ER) as the auxiliary task to alleviate the need for extensive data and use additional information from ER to help SRL. We evaluate our approach on Indonesian conversational dataset. Our experiments show that multi-task active learning can outperform single-task active learning method and standard multi-task learning. According to our results, active learning is more efficient by using 12% less of training data compared to passive learning in both single-task and multi-task setting. We also introduce a new dataset for SRL in Indonesian conversational domain to encourage further research in this area.

pdf bib
Semantic Role Labeling in Conversational Chat using Deep Bi-Directional Long Short-Term Memory Networks with Attention Mechanism
Valdi Rachman | Rahmad Mahendra | Alfan Farizki Wicaksono | Ahmad Rizqi Meydiarso | Fariz Ikhwantri
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation