Antonio Moreno-Sandoval

NYU, Univ. Autónoma de Madrid

Also published as: Antonio Moreno, Antonio Moreno Sandoval, Antonio Moreno Sandoval

Other people with similar names: Antonio Moreno-Ortiz (Univ. of Málaga), Antonio Moreno Ribas (Univ. Rovira i Virgili)

2025

Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Chung-Chi Chen | Antonio Moreno-Sandoval | Jimin Huang | Qianqian Xie | Sophia Ananiadou | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

pdf bib abs

The Financial Document Causality Detection Shared Task (FinCausal 2025)
Antonio Moreno Sandoval | Blanca Carbajo Coronado | Jordi Porta Zamorano | Yanco Amor Torterolo Orta | Doaa Samy
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

We present the Financial Document Causality Detection Task (FinCausal 2025), a multilingual challenge to identify causal relationships within financial texts. This task comprises English and Spanish subtasks, with datasets compiled from British and Spanish annual reports. Participants were tasked with identifying and generating answers to questions about causes or effects within specific text segments. The dataset combines extractive and generative question-answering (QA) methods, with abstractly formulated questions and directly extracted answers from the text. Systems performance is evaluated using exact matching and semantic similarity metrics. The challenge attracted submissions from 10 teams for the English subtask and 10 teams for the Spanish subtask. FinCausal 2025 is part of the 6th Financial Narrative Processing Workshop (FNP 2025), hosted at COLING 2025 in Abu Dhabi.

2022

pdf bib abs

This paper presents the results and findings of the Financial Narrative Summarisation Shared Task on summarising UK, Greek and Spanish annual reports. The shared task was organised as part of the Financial Narrative Processing 2022 Workshop (FNP 2022 Workshop). The Financial Narrative summarisation Shared Task (FNS-2022) has been running since 2020 as part of the Financial Narrative Processing (FNP) workshop series (El-Haj et al., 2022; El-Haj et al., 2021; El-Haj et al., 2020b; El-Haj et al., 2019c; El-Haj et al., 2018). The shared task included one main task which is the use of either abstractive or extractive automatic summarisers to summarise long documents in terms of UK, Greek and Spanish financial annual reports. This shared task is the third to target financial documents. The data for the shared task was created and collected from publicly available annual reports published by firms listed on the Stock Exchanges of UK, Greece and Spain. A total number of 14 systems from 7 different teams participated in the shared task.

pdf bib abs

This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial documents, its participants results and their findings. This shared task was organized as part of The 4th Financial Narrative Processing Workshop (FNP 2022), held jointly at The 13th Edition of the Language Resources and Evaluation Conference (LREC 2022), Marseille, France (El-Haj et al., 2022). This shared task aimed to stimulate research in systems for extracting table-of-contents (TOC) from investment documents (such as financial prospectuses) by detecting the document titles and organizing them hierarchically into a TOC. For the forth edition of this shared task, three subtasks were presented to the participants: one with English documents, one with French documents and the other one with Spanish documents. This year, we proposed a different and revised dataset for English and French compared to the previous editions of FinTOC and a new dataset for Spanish documents was added. The task attracted 6 submissions for each language from 4 teams, and the most successful methods make use of textual, structural and visual features extracted from the documents and propose classification models for detecting titles and TOCs for all of the subtasks.

2019

pdf bib

Tone Analysis in Spanish Financial Reporting Narratives
Antonio Moreno-Sandoval | Pablo Alfonso Haya Ana Gisbert | Marta Guerrero | Helena Montoro
Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019)

2012

pdf bib abs

This paper presents a method for designing, compiling and annotating corpora intended for language learners. In particular, we focus on spoken corpora for being used as complementary material in the classroom as well as in examinations. We describe the three corpora (Spanish, Chinese and Japanese) compiled by the Laboratorio de Lingüística Informática at the Autonomous University of Madrid (LLI-UAM). A web-based concordance tool has been used to search for examples in the corpus, and providing the text along with the corresponding audio. Teaching materials from the corpus, consisting the texts, the audio files and exercises on them, are currently on development.

pdf bib abs

Medical Term Extraction in an Arabic Medical Corpus
Doaa Samy | Antonio Moreno-Sandoval | Conchi Bueno-Díaz | Marta Garrote-Salazar | José M. Guirao
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper tests two different strategies for medical term extraction in an Arabic Medical Corpus. The experiments and the corpus are developed within the framework of Multimedica project funded by the Spanish Ministry of Science and Innovation and aiming at developing multilingual resources and tools for processing of newswire texts in the Health domain. The first experiment uses a fixed list of medical terms, the second experiment uses a list of Arabic equivalents of very limited list of common Latin prefix and suffix used in medical terms. Results show that using equivalents of Latin suffix and prefix outperforms the fixed list. The paper starts with an introduction, followed by a description of the state-of-art in the field of Arabic Medical Language Resources (LRs). The third section describes the corpus and its characteristics. The fourth and the fifth sections explain the lists used and the results of the experiments carried out on a sub-corpus for evaluation. The last section analyzes the results outlining the conclusions and future work.

2008

pdf bib abs

Developing a Phonemic and Syllabic Frequency Inventory for Spontaneous Spoken Castilian Spanish and their Comparison to Text-Based Inventories
Antonio Moreno Sandoval | Doroteo Torre Toledano | Raúl de la Torre | Marta Garrote | José M. Guirao
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present our recent work to develop phonemic and syllabic inventories for Castilian Spanish based on the C-ORAL-ROM corpus, a spontaneous spoken resource with varying degrees of naturalness and in different communicative contexts. These inventories have been developed by means of a phonemic and syllabic automatic transcriptor whose output has been assessed by manually reviewing most of the transcriptions. The inventories include absolute frequencies of occurrence of the different phones and syllables. These frequencies have been contrasted against an inventory extracted from a comparable textual corpus, finding evidence that the available inventories, based mainly on text, do not provide an accurate description of spontaneously spoken Castilian Spanish.

2006

pdf bib abs

Building a Parallel Multilingual Corpus (Arabic-Spanish-English)
Doaa Samy | Antonio Moreno Sandoval | José M. Guirao | Enrique Alfonseca
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents the results (1st phase) of the on-going research in the Computational Linguistics Laboratory at Autónoma University of Madrid (LLI-UAM) aiming at the development of a multi-lingual parallel corpus (Arabic-Spanish-English) aligned on the sentence level and tagged on the POS level. A multilingual parallel corpus which brings together Arabic, Spanish and English is a new resource for the NLP community that completes the present panorama of parallel corpora. In the first part of this study, we introduce the novelty of our approach and the challenges encountered to create such a corpus. This introductory part highlights the main features of the corpus and the criteria applied during the selection process. The second part focuses on two main stages: basic processing (tokenization and segmentation) and alignment. Methodology of alignment is explained in detail and results obtained in the three different linguistic pairs are compared. POS tagging and tools used in this stage are discussed in the third part. The final output is available in two versions: the non-aligned version and the aligned one. The latter adopts the TMX (Translation Memory Exchange) standard format. At the end, the section dedicated to the future work points out the key stages concerned with extending the corpus and the studies that can benefit, directly or indirectly, from such a resource.

pdf bib abs

The wraetlic NLP suite
Enrique Alfonseca | Antonio Moreno-Sandoval | José María Guirao | María Ruiz-Casado
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we describe the second release of a suite of language analysers, developed over the last five years, called wraetlic, which includes tools for several partial parsing tasks, both for English and Spanish. It has been successfully used in fields such as Information Extraction, thesaurus acquisition, Text Summarisation and Computer Assisted Assessment.

2004

pdf bib abs

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
Emanuela Cresti | Fernanda Bacelar do Nascimento | Antonio Moreno Sandoval | Jean Veronis | Philippe Martin | Khalid Choukri
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The C-ORAL-ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntactic structures in the four romance languages. Sampling criteria are defined in a corpus design scheme. C-ORAL-ROM adopts two different sampling strategies, one for the formal and one for the informal part: While a set of typical domains of application is selected to document the formal use of language, the informal part documents speech variation using parameters referring to the event’s structure (dialogue vs. monologue) and the sociological domain of use (family-private vs public). The four romance corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are assumed to be the more relevant cues for the identification of relevant linguistic domains in spontaneous speech (utterances). Relations with other concurrent criteria are discussed. The multimedia storage of the C-ORAL-ROM corpus is based on this principle; each textual string ending with a terminal break is aligned, through the Win Pitch speech software, to its acoustic counterpart, generating the data base of all utterances.

pdf bib

Syntax to Semantics Transformation: Application to Treebanking
Manuel Alcántara | Antonio Moreno
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

pdf bib abs

Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus
Doaa Samy | Antonio Moreno-Sandoval | José M. Guirao
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Parallel corpora are considered an important resource for the development of linguistic tools. In this paper our main goal is the development of a bilingual lexicon of verbs. The construction of this lexicon is possible using two main resources: I) a parallel corpus (through the alignment); II) the linguistic tools developed for Spanish (which serve as a starting point for developing tools for Arabic language). At the end, aligned equivalent verbs are detected automatically from a parallel corpus Spanish-Arabic. To achieve this goal, we had to pass through different preparatory stages concerning the assesment of the parallel corpus, the monolingual tokenization of each corpus, a preliminary sentence alignment and finally applying the model of automatic extraction of equivalent verbs. Our method is hybrid, since it combines both statistical and linguistic approaches.