Ali Al-Laith

2025

pdf bib abs
Automated Generation of Arabic Verb Conjugations with Multilingual Urdu Translation: An NLP Approach
Haq Nawaz | Manal Elobaid | Ali Al-Laith | Saif Ullah
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script

This paper presents a rule-based automated system for generating both Arabic verb conjugations and their corresponding Urdu translations. The system processes triliteral, non-weak Arabic roots across key tenses Past Simple, Past Simple Negative, Present Simple, and Present Simple Negative. Addressing the challenges posed by Arabic morphology, our rule-based approach applies patterns and morphological rules to accurately produce verb conjugations, capturing essential grammatical variations in gender, number, and person. Simultaneously, the system generates Urdu translations using predefined patterns that is aligned with the grammatical nuances of Arabic, ensuring semantic consistency. As the first system of its kind, it uniquely provides a cross-lingual resource that bridges two linguistically similar but distinct languages. By focusing on rule based precision and dual-language outputs, it addresses critical gaps in NLP resources, serving as a valuable tool for linguists, educators, and NLP researchers in academic and religious contexts where Arabic and Urdu coexist.

pdf bib abs
Dying or Departing? Euphemism Detection for Death Discourse in Historical Texts
Ali Al-Laith | Alexander Conroy | Jens Bjerring-Hansen | Bolette Pedersen | Carsten Levisen | Daniel Hershcovich
Proceedings of the 31st International Conference on Computational Linguistics

Euphemisms are a linguistic device used to soften discussions of sensitive or uncomfortable topics, with death being a prominent example. In this paper, we present a study on the detection of death-related euphemisms in historical literary texts from a corpus containing Danish and Norwegian novels from the late 19th century. We introduce an annotated dataset of euphemistic and literal references to death, including both common and rare euphemisms, ranging from well-established terms to more culturally nuanced expressions. We evaluate the performances of state-of-the-art pre-trained language models fine-tuned for euphemism detection. Our findings show that fixed, literal expressions of death became less frequent over time, while metaphorical euphemisms grew in prevalence. Additionally, euphemistic language was more common in historical novels, whereas contemporary novels tended to refer to death more literally, reflecting the rise of secularism. These results shed light on the shifting discourse on death during a period when the concept of death as final became prominent.

pdf bib abs
Exploring the Effectiveness of Multilingual and Generative Large Language Models for Question Answering in Financial Texts
Ali Al-Laith
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

This paper investigates the use of large language models (LLMs) for financial causality detection in the FinCausal 2025 shared task, focusing on generative and multilingual question answering (QA) tasks. Our study employed both generative and discriminative approaches, utilizing GPT-4o for generative QA and BERT-base-multilingual-cased, XLM-RoBerta-large, and XLM-RoBerta-base for multilingual QA across English and Spanish datasets. The datasets consist of financial disclosures where questions reflect causal relationships, paired with extractive answers derived directly from the text. Evaluation was conducted using Semantic Answer Similarity (SAS) and Exact Match (EM) metrics. While the discriminative XLM-RoBerta-large model achieved the best overall performance, ranking 5th in English (SAS: 0.9598, EM: 0.7615) and 4th in Spanish (SAS: 0.9756, EM: 0.8084) among 11 team submissions, our results also highlight the effectiveness of the generative GPT-4o approach. Notably, GPT-4o achieved promising results in few-shot settings, with SAS scores approaching those of fine-tuned discriminative models, demonstrating that the generative approach can provide competitive performance despite lacking task-specific fine-tuning. This comparison underscores the potential of generative LLMs as robust, versatile alternatives for complex QA tasks like financial causality detection.

pdf bib abs
Annotating and Classifying Direct Speech in Historical Danish and Norwegian Literary Texts
Ali Al-Laith | Alexander Conroy | Kirstine Nielsen Degn | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Analyzing direct speech in historical literary texts provides insights into character dynamics, narrative style, and discourse patterns. In late 19th century Danish and Norwegian fiction direct speech reflects characters’ social and geographical backgrounds. However, inconsistent typographic conventions in Scandinavian literature complicate computational methods for distinguishing direct speech from other narrative elements. To address this, we introduce an annotated dataset from the MeMo corpus, capturing speech markers and tags in Danish and Norwegian novels. We evaluate pre-trained language models for classifying direct speech, with results showing that a Danish Foundation Model (DFM), trained on extensive Danish data, has the highest performance. Finally, we conduct a classifier-assisted quantitative corpus analysis and find a downward trend in the prevalence of speech over time.

pdf bib abs
Evaluating LLM-Generated Explanations of Metaphors – A Culture-Sensitive Study of Danish
Bolette S. Pedersen | Nathalie Sørensen | Sanni Nimb | Dorte Haltrup Hansen | Sussi Olsen | Ali Al-Laith
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

In this study, we examine how well Danish culture-specific metaphors are explained by two of the best performing language models for Danish, namely ChatGPT and Llama. For comparison, the explana- tions are measured against how well cross- lingual (or ’universal’) metaphors are ex- plained by the models; referring here to metaphors that exist in Danish as well as across cultures and languages and in par- ticular in English. To perform our study, we compile a pilot dataset of 150 Danish metaphors and idioms divided tentatively by culture specificity. We prompt the two models and perform a careful qualitative evaluation of the explanations against a four-graded scale. Our studies show that both models are heavily biased towards English since they have much more suc- cess in explaining the metaphors that also exist in English than the culture-specific ones, relying presumably on erroneous transfer from English when dealing with the latter. In particular, the sentiment of the culture-specific metaphors seems to be often ’lost in translation’. We further claim that this strong colouring towards English poses a serious problem in the era of LLMs with regards to developing and maintaining cultural and linguistic diver- sity in other languages.

pdf bib abs
Evaluating Calibration of Arabic Pre-trained Language Models on Dialectal Text
Ali Al-Laith | Rachida Kebdani
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)

While pre-trained language models have made significant progress in different classification tasks, little attention has been given to the reliability of their confidence scores. Calibration, how well model confidence aligns with actual accuracy, is essential for real-world applications where decisions rely on probabilistic outputs. This study addresses this gap in Arabic dialect identification by assessing the calibration of eight pre-trained language models, ensuring their predictions are not only accurate but also reliable for practical applications. We analyze two datasets: one with over 1 million text samples and the Nuanced Arabic Dialect Identification dataset(NADI-2023). Using Expected Calibration Error (ECE) as a metric, we reveal substantial variation in model calibration across dialects in both datasets, showing that prediction confidence can vary significantly depending on regional data. This research has implications for improving the reliability of Arabic dialect models in applications like sentiment analysis and social media monitoring.

2024

pdf bib abs
Noise, Novels, Numbers. A Framework for Detecting and Categorizing Noise in Danish and Norwegian Literature
Ali Al-Laith | Daniel Hershcovich | Jens Bjerring-Hansen | Jakob Ingemann Parby | Alexander Conroy | Timothy R Tangherlini
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We present a framework for detecting and categorizing noise in literary texts, demonstrated through its application to Danish and Norwegian literature from the late 19-th century. Noise, understood as “aberrant sonic behaviour,” is not only an auditory phenomenon but also a cultural construct tied to the processes of civilization and urbanization.We begin by utilizing topic modeling techniques to identify noise-related documents, followed by fine-tuning BERT-based language models trained on Danish and Norwegian texts to analyze a corpus of over 800 novels.We identify and track the prevalence of noise in these texts, offering insights into the literary perceptions of noise during the Scandinavian “Modern Breakthrough” period (1870-1899). Our contributions include the development of a comprehensive dataset annotated for noise-related segments and their categorization into human-made, non-human-made, and musical noises. This study illustrates the framework’s potential for enhancing the understanding of the relationship between noise and its literary representations, providing a deeper appreciation of the auditory elements in literary works, including as sources for cultural history.

pdf bib abs
Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts
Ali Al-Laith | Alexander Conroy | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.