Development and Validation of a Corpus for Machine Humor Comprehension

Yuen-Hsien Tseng, Wun-Syuan Wu, Chia-Yueh Chang, Hsueh-Chih Chen, Wei-Lun Hsu


Abstract
This work developed a Chinese humor corpus containing 3,365 jokes collected from over 40 sources. Each joke was labeled with five levels of funniness, eight skill sets of humor, and six dimensions of intent by only one annotator. To validate the manual labels, we trained SVM (Support Vector Machine) and BERT (Bidirectional Encoder Representations from Transformers) with half of the corpus (labeled by one annotator) to predict the skill and intent labels of the other half (labeled by the other annotator). Based on two assumptions that a valid manually labeled corpus should follow, our results showed the validity for the skill and intent labels. As to the funniness label, the validation results showed that the correlation between the corpus label and user feedback rating is marginal, which implies that the funniness level is a harder annotation problem to be solved. The contribution of this work is two folds: 1) a Chinese humor corpus is developed with labels of humor skills, intents, and funniness, which allows machines to learn more intricate humor framing, effect, and amusing level to predict and respond in proper context (https://github.com/SamTseng/Chinese_Humor_MultiLabeled). 2) An approach to verify whether a minimum human labeled corpus is valid or not, which facilitates the validation of low-resource corpora.
Anthology ID:
2020.lrec-1.168
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1346–1352
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.168
DOI:
Bibkey:
Cite (ACL):
Yuen-Hsien Tseng, Wun-Syuan Wu, Chia-Yueh Chang, Hsueh-Chih Chen, and Wei-Lun Hsu. 2020. Development and Validation of a Corpus for Machine Humor Comprehension. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1346–1352, Marseille, France. European Language Resources Association.
Cite (Informal):
Development and Validation of a Corpus for Machine Humor Comprehension (Tseng et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.168.pdf