UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

Marloes Kuijper, Mike van Lenthe, Rik van Noord


Abstract
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.
Anthology ID:
S18-1041
Volume:
Proceedings of the 12th International Workshop on Semantic Evaluation
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marianna Apidianaki, Saif M. Mohammad, Jonathan May, Ekaterina Shutova, Steven Bethard, Marine Carpuat
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
279–285
Language:
URL:
https://aclanthology.org/S18-1041
DOI:
10.18653/v1/S18-1041
Bibkey:
Cite (ACL):
Marloes Kuijper, Mike van Lenthe, and Rik van Noord. 2018. UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 279–285, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish (Kuijper et al., SemEval 2018)
Copy Citation:
PDF:
https://aclanthology.org/S18-1041.pdf