Long Vo-Dang


2025

Transparency in AI healthcare decision-makingis crucial. By incorporating rationales to explain reason for each predicted label, userscould understand Large Language Models(LLMs)’s reasoning to make better decision.In this work, we introduce a new task - Sentiment Reasoning - for both speech and textmodalities, and our proposed multimodal multitask framework and the world’s largest multimodal sentiment analysis dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts boththe sentiment label and generates the rationale behind it based on the input transcript.Our study conducted on both human transcriptsand Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helpsimprove model transparency by providing rationale for model prediction with quality semantically comparable to humans while alsoimproving model’s classification performance(+2% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also,no significant difference in the semantic quality of generated rationales between human andASR transcripts. All code, data (five languages - Vietnamese, English, Chinese, German, andFrench) and models are published online.
Spoken Named Entity Recognition (NER) aims to extract named entities from speech and categorise them into types like person, location, organization, etc. In this work, we present *VietMed-NER* - the first spoken NER dataset in the medical domain. To our knowledge, our Vietnamese real-world dataset is the largest spoken NER dataset in the world regarding the number of entity types, featuring 18 distinct types. Furthermore, we present baseline results using various state-of-the-art pre-trained models: encoder-only and sequence-to-sequence; and conduct quantitative and qualitative error analysis. We found that pre-trained multilingual models generally outperform monolingual models on reference text and ASR output and encoders outperform sequence-to-sequence models in NER tasks. By translating the transcripts, the dataset can also be utilised for text NER in the medical domain in other languages than Vietnamese. All code, data and models are publicly available.