We explore the ability of pre-trained language models BART, an encoder-decoder model, GPT2 and GPT-Neo, both decoder-only models for generating sentences from structured MR tags as input. We observe best results on several metrics for the YelpNLG and E2E datasets. Style based implicit tags such as emotion, sentiment, length etc., allows for controlled generation but it is typically not present in MR. We present an analysis on YelpNLG showing BART can express the content with stylistic variations in the structure of the sentence. Motivated with the results, we define a new task of emotional situation generation from various POS tags and emotion label values as MR using EmpatheticDialogues dataset and report a baseline. Encoder-Decoder attention analysis shows that BART learns different aspects in MR at various layers and heads.
We consider the task of automatically classifying the persuasion strategy employed by an utterance in a dialog. We base our work on the PERSUASION-FOR-GOOD dataset, which is composed of conversations between crowdworkers trying to convince each other to make donations to a charity. Currently, the best known performance on this dataset, for classification of persuader’s strategy, is not derived by employing pretrained language models like BERT. We observe that a straightforward fine-tuning of BERT does not provide significant performance gain. Nevertheless, nonuniformly sampling to account for the class imbalance and a cost function enforcing a hierarchical probabilistic structure on the classes provides an absolute improvement of 10.79% F1 over the previously reported results. On the same dataset, we replicate the framework for classifying the persuadee’s response.
We consider a task based on CVPR 2018 challenge dataset on advertisement (Ad) understanding. The task involves detecting the viewer’s interpretation of an Ad image captured as text. Recent results have shown that the embedded scene-text in the image holds a vital cue for this task. Motivated by this, we fine-tune the base BERT model for a sentence-pair classification task. Despite utilizing the scene-text as the only source of visual information, we could achieve a hit-or-miss accuracy of 84.95% on the challenge test data. To enable BERT to process other visual information, we append image captions to the scene-text. This achieves an accuracy of 89.69%, which is an improvement of 4.7%. This is the best reported result for this task.