The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
A Survey on Approaches to Computational Humor Generation
Miriam Amin | Manuel Burghardt
Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
We provide a comprehensive overview of existing systems for the computational generation of verbal humor in the form of jokes and short humorous texts. Considering linguistic humor theories, we analyze the systematic strengths and drawbacks of the different approaches. In addition, we show how the systems have been evaluated so far and propose two evaluation criteria: humorousness and complexity. From our analysis of the field, we conclude new directions for the advancement of computational humor generation.