2025
pdf
bib
abs
GMU-MU at the Financial Misinformation Detection Challenge Task: Exploring LLMs for Financial Claim Verification
Alphaeus Dmonte
|
Roland R. Oruche
|
Marcos Zampieri
|
Eunmi Ko
|
Prasad Calyam
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
This paper describes the team GMU-MU submission to the Financial Misinformation Detection challenge. The goal of this challenge is to identify financial misinformation and generate explanations justifying the predictions by developing or adapting LLMs. The participants were provided with a dataset of financial claims that were categorized into six financial domain categories. We experiment with the Llama model using two approaches; instruction-tuning the model with the training dataset, and a prompting approach that directly evaluates the off-the-shelf model. Our best system was placed 5th among the 12 systems, achieving an overall evaluation score of 0.6682.
pdf
bib
abs
Does Machine Translation Impact Offensive Language Identification? The Case of Indo-Aryan Languages
Alphaeus Dmonte
|
Shrey Satapara
|
Rehab Alsudais
|
Tharindu Ranasinghe
|
Marcos Zampieri
Proceedings of the First Workshop on Language Models for Low-Resource Languages
The accessibility to social media platforms can be improved with the use of machine translation (MT). Non-standard features present in user-generated on social media content such as hashtags, emojis, and alternative spellings can lead to mistranslated instances by the MT systems. In this paper, we investigate the impact of MT on offensive language identification in Indo-Aryan languages. We use both original and MT datasets to evaluate the performance of various offensive language models. Our evaluation indicates that offensive language identification models achieve superior performance on original data than on MT data, and that the models trained on MT data identify offensive language more precisely on MT data than the models trained on original data.
2023
pdf
bib
abs
ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval
Kai North
|
Alphaeus Dmonte
|
Tharindu Ranasinghe
|
Matthew Shardlow
|
Marcos Zampieri
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Lexical simplification (LS) automatically replaces words that are deemed difficult to understand for a given target population with simpler alternatives, whilst preserving the meaning of the original sentence. The TSAR-2022 shared task on LS provided participants with a multilingual lexical simplification test set. It contained nearly 1,200 complex words in English, Portuguese, and Spanish and presented multiple candidate substitutions for each complex word. The competition did not make training data available; therefore, teams had to use either off-the-shelf pre-trained large language models (LLMs) or out-domain data to develop their LS systems. As such, participants were unable to fully explore the capabilities of LLMs by re-training and/or fine-tuning them on in-domain data. To address this important limitation, we present ALEXSIS+, a multilingual dataset in the aforementioned three languages, and ALEXSIS++, an English monolingual dataset that together contains more than 50,000 unique sentences retrieved from news corpora and annotated with cosine similarities to the original complex word and sentence. Using these additional contexts, we are able to generate new high-quality candidate substitutions that improve LS performance on the TSAR-2022 test set regardless of the language or model.
2022
pdf
bib
abs
GMU-WLV at TSAR-2022 Shared Task: Evaluating Lexical Simplification Models
Kai North
|
Alphaeus Dmonte
|
Tharindu Ranasinghe
|
Marcos Zampieri
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
This paper describes team GMU-WLV submission to the TSAR shared-task on multilingual lexical simplification. The goal of the task is to automatically provide a set of candidate substitutions for complex words in context. The organizers provided participants with ALEXSIS a manually annotated dataset with instances split between a small trial set with a dozen instances in each of the three languages of the competition (English, Portuguese, Spanish) and a test set with over 300 instances in the three aforementioned languages. To cope with the lack of training data, participants had to either use alternative data sources or pre-trained language models. We experimented with monolingual models: BERTimbau, ELECTRA, and RoBERTA-largeBNE. Our best system achieved 1st place out of sixteen systems for Portuguese, 8th out of thirty-three systems for English, and 6th out of twelve systems for Spanish.