Nordin El Balima Cordero
2023
I2C-Huelva at SemEval-2023 Task 9: Analysis of Intimacy in Multilingual Tweets Using Resampling Methods and Transformers
Abel Pichardo Estevez
|
Jacinto Mata Vázquez
|
Victoria Pachón Álvarez
|
Nordin El Balima Cordero
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Nowadays, intimacy is a fundamental aspect of how we relate to other people in social settings. The most frequent way in which we can determine a high level of intimacy is in the use of certain emoticons, curse words, verbs, etc. This paper presents the approach developed to solve SemEval 2023 task 9: Multiligual Tweet Intimacy Analysis. To address the task, a transfer learning approach was conducted by fine tuning various pre-trained languagemodels. Since the dataset supplied by the organizer was highly imbalanced, our main strategy to obtain high prediction values was the implementation of different oversampling and undersampling techniques on the training set. Our final submission achieved an overall Pearson’s r of 0.497.
I2C Huelva at SemEval-2023 Task 4: A Resampling and Transformers Approach to Identify Human Values behind Arguments
Nordin El Balima Cordero
|
Jacinto Mata Vázquez
|
Victoria Pachón Álvarez
|
Abel Pichardo Estevez
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents the approaches proposedfor I2C Group to address the SemEval-2023Task 4: Identification of Human Values behindArguments (ValueEval)”, whose goal is to classify 20 different categories of human valuesgiven a textual argument. The dataset of thistask consists of one argument per line, including its unique argument ID, conclusion, stanceof the premise towards the conclusion and thepremise text. To indicate whether the argumentdraws or not on that category a binary indication (1 or 0) is included. Participants can submit approaches that detect one, multiple, or allof these values in arguments. The task providesan opportunity for researchers to explore theuse of automated techniques to identify humanvalues in text and has potential applications invarious domains such as social science, politics,and marketing. To deal with the imbalancedclass distribution given, our approach undersamples the data. Additionally, the three components of the argument (conclusion, stanceand premise) are used for training. The systemoutperformed the BERT baseline according toofficial evaluation metrics, achieving a f1 scoreof 0.46.