Robin Rojowiec
2020
From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain
Robin Rojowiec
|
Jana Götze
|
Philipp Sadler
|
Henrik Voigt
|
Sina Zarrieß
|
David Schlangen
Proceedings of the 13th International Conference on Natural Language Generation
While certain types of instructions can be com-pactly expressed via images, there are situations where one might want to verbalise them, for example when directing someone. We investigate the task of Instruction Generation from Before/After Image Pairs which is to derive from images an instruction for effecting the implied change. For this, we make use of prior work on instruction following in a visual environment. We take an existing dataset, the BLOCKS data collected by Bisk et al. (2016) and investigate whether it is suitable for training an instruction generator as well. We find that it is, and investigate several simple baselines, taking these from the related task of image captioning. Through a series of experiments that simplify the task (by making image processing easier or completely side-stepping it; and by creating template-based targeted instructions), we investigate areas for improvement. We find that captioning models get some way towards solving the task, but have some difficulty with it, and future improvements must lie in the way the change is detected in the instruction.
Intent Recognition in Doctor-Patient Interviews
Robin Rojowiec
|
Benjamin Roth
|
Maximilian Fink
Proceedings of the Twelfth Language Resources and Evaluation Conference
Learning to interview patients to find out their disease is an essential part of the training of medical students. The practical part of this training has traditionally relied on paid actors that play the role of a patient to be interviewed. This process is expensive and severely limits the amount of practice per student. In this work, we present a novel data set and methods based on Natural Language Processing, for making progress towards modern applications and e-learning tools that support this training by providing language-based user interfaces with virtual patients. A data set of german transcriptions from live doctor-patient interviews was collected. These transcriptions are based on audio recordings of exercise sessions within the university and only the doctor’s utterances could be transcribed. We annotated each utterance with an intent inventory characterizing the purpose of the question or statement. For some intent classes, the data only contains a few samples, and we apply Information Retrieval and Deep Learning methods that are robust with respect to small amounts of training data for recognizing the intent of an utterance and providing the correct response. Our results show that the models are effective and they provide baseline performance scores on the data set for further research.
Search
Fix data
Co-authors
- Maximilian Fink 1
- Jana Götze 1
- Benjamin Roth 1
- Philipp Sadler 1
- David Schlangen 1
- show all...