VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA

Andrea Menco Tovar; Jairo E. Serrano; Edwin Puertas; Juan Carlos Martinez-Santos

VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA

Andrea Menco Tovar, Jairo E. Serrano, Edwin Puertas, Juan Carlos Martinez-Santos

Abstract

This work addresses the temporal ordering task of clinical frames in the Basic Life Support (BLS) subset of ClinSkillQA. A two-stage hybrid pipeline based on Qwen2-VL-2B-Instruct in a zero-shot configuration is proposed. In Stage 1, each image is processed independently to extract factual visual evidence, which is then transformed, using deterministic rules, into a structured representation. In Stage 2, ordering is formulated as an ordinal scoring task over procedural stages, with ties broken using PCA applied to multimodal embeddings. Evaluation followed the official benchmark protocol, considering Task Accuracy, Pairwise Accuracy, and BERTScore. In the test phase, the system achieved Task Accuracy = 0.17, Pairwise Micro Accuracy = 0.60, and BERT F1 = 0.71, with complete coverage in both predictions and rationales. The results demonstrate an interpretable and reproducible foundation, although challenges in fine-grained temporal discrimination remain.

Anthology ID:: 2026.bionlp-2.2
Volume:: Proceedings of the BioNLP 2026 (Shared Tasks)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Deepak Gupta, Dina Demner-Fushman
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6–12
Language:
URL:: https://aclanthology.org/2026.bionlp-2.2/
DOI:
Bibkey:
Cite (ACL):: Andrea Menco Tovar, Jairo E. Serrano, Edwin Puertas, and Juan Carlos Martinez-Santos. 2026. VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA. In Proceedings of the BioNLP 2026 (Shared Tasks), pages 6–12, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA (Menco Tovar et al., BioNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.bionlp-2.2.pdf
Supplementarymaterial:: 2026.bionlp-2.2.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data