2024
pdf
bib
abs
I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles
Alberto Rodero Peña
|
Jacinto Mata Vazquez
|
Victoria Pachón Álvarez
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
With the rise of AI-based text generators, the need for effective detection mechanisms has become paramount. This paper presents new techniques for building adaptable models and optimizing training aspects for identifying synthetically produced texts across multiple generators and domains. The study, divided into binary and multilabel classification tasks, avoids overfitting through strategic training data limitation. A key innovation is the incorporation of multimodal models that blend numerical text features with conventional NLP approaches. The work also delves into optimizing ensemble model combinations via various voting methods, focusing on accuracy as the official metric. The optimized ensemble strategy demonstrates significant efficacy in both subtasks, highlighting the potential of multimodal and ensemble methods in enhancing the robustness of detection systems against emerging text generators.
2023
pdf
bib
abs
I2C-Huelva at SemEval-2023 Task 9: Analysis of Intimacy in Multilingual Tweets Using Resampling Methods and Transformers
Abel Pichardo Estevez
|
Jacinto Mata Vázquez
|
Victoria Pachón Álvarez
|
Nordin El Balima Cordero
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Nowadays, intimacy is a fundamental aspect of how we relate to other people in social settings. The most frequent way in which we can determine a high level of intimacy is in the use of certain emoticons, curse words, verbs, etc. This paper presents the approach developed to solve SemEval 2023 task 9: Multiligual Tweet Intimacy Analysis. To address the task, a transfer learning approach was conducted by fine tuning various pre-trained languagemodels. Since the dataset supplied by the organizer was highly imbalanced, our main strategy to obtain high prediction values was the implementation of different oversampling and undersampling techniques on the training set. Our final submission achieved an overall Pearson’s r of 0.497.
pdf
bib
abs
I2C-Huelva at SemEval-2023 Task 10: Ensembling Transformers Models for the Detection of Online Sexism
Lavinia Felicia Fudulu
|
Alberto Rodriguez Tenorio
|
Victoria Pachón Álvarez
|
Jacinto Mata Vázquez
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This work details our approach for addressing Tasks A and B of the Semeval 2023 Task 10: Explainable Detection of Online Sexism (EDOS). For Task A a simple ensemble based of majority vote system was presented. To build our proposal, first a review of transformers was carried out and the 3 best performing models were selected to be part of the ensemble. Next, for these models, the best hyperpameters were searched using a reduced data set. Finally, we trained these models using more data. During the development phase, our ensemble system achieved an f1-score of 0.8403. For task B, we developed a model based on the deBERTa transformer, utilizing the hyperparameters identified for task A. During the development phase, our proposed model attained an f1-score of 0.6467. Overall, our methodology demonstrates an effective approach to the tasks, leveraging advanced machine learning techniques and hyperparameters searches to achieve high performance in detecting and classifying instances of sexism in online text.
pdf
bib
abs
I2C Huelva at SemEval-2023 Task 4: A Resampling and Transformers Approach to Identify Human Values behind Arguments
Nordin El Balima Cordero
|
Jacinto Mata Vázquez
|
Victoria Pachón Álvarez
|
Abel Pichardo Estevez
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents the approaches proposedfor I2C Group to address the SemEval-2023Task 4: Identification of Human Values behindArguments (ValueEval)”, whose goal is to classify 20 different categories of human valuesgiven a textual argument. The dataset of thistask consists of one argument per line, including its unique argument ID, conclusion, stanceof the premise towards the conclusion and thepremise text. To indicate whether the argumentdraws or not on that category a binary indication (1 or 0) is included. Participants can submit approaches that detect one, multiple, or allof these values in arguments. The task providesan opportunity for researchers to explore theuse of automated techniques to identify humanvalues in text and has potential applications invarious domains such as social science, politics,and marketing. To deal with the imbalancedclass distribution given, our approach undersamples the data. Additionally, the three components of the argument (conclusion, stanceand premise) are used for training. The systemoutperformed the BERT baseline according toofficial evaluation metrics, achieving a f1 scoreof 0.46.
2021
pdf
bib
abs
Identification of profession & occupation in Health-related Social Media using tweets in Spanish
Victoria Pachón
|
Jacinto Mata Vázquez
|
Juan Luís Domínguez Olmedo
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
In this paper we present our approach and system description on Task 7a in ProfNer-ST: Identification of profession & occupation in Health related Social Media. Our main contribution is to show the effectiveness of using BETO-Spanish BERT as a model based on transformers pretrained with a Spanish Corpus for classification tasks. In our experiments we compared several architectures based on transformers with others based on classical machine learning algorithms. With this approach, we achieved an F1-score of 0.92 in the evaluation process.
2020
pdf
bib
abs
I2C at SemEval-2020 Task 12: Simple but Effective Approaches to Offensive Speech Detection in Twitter
Victoria Pachón Álvarez
|
Jacinto Mata Vázquez
|
José Manuel López Betanzos
|
José Luis Arjona Fernández
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes the systems developed for I2C Group to participate on Subtasks A and B in English, and Subtask A in Turkish and Arabic in OffensEval (Task 12 of SemEval 2020). In our experiments we compare three architectures we have developed, two based on Transformer and the other based on classical machine learning algorithms. In this paper, the proposed architectures are described, and the results obtained by our systems are presented.
2017
pdf
bib
abs
Annotating Negation in Spanish Clinical Texts
Noa Cruz
|
Roser Morante
|
Manuel J. Maña López
|
Jacinto Mata Vázquez
|
Carlos L. Parra Calderón
Proceedings of the Workshop Computational Semantics Beyond Events and Roles
In this paper we present on-going work on annotating negation in Spanish clinical documents. A corpus of anamnesis and radiology reports has been annotated by two domain expert annotators with negation markers and negated events. The Dice coefficient for inter-annotator agreement is higher than 0.94 for negation markers and higher than 0.72 for negated events. The corpus will be publicly released when the annotation process is finished, constituting the first corpus annotated with negation for Spanish clinical reports available for the NLP community.