Pawel Swietojanski


pdf bib
SLURP: A Spoken Language Understanding Resource Package
Emanuele Bastianelli | Andrea Vanzo | Pawel Swietojanski | Verena Rieser
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU and ASR systems; (3) A new transparent metric for entity labelling which enables a detailed error analysis for identifying potential areas of improvement. SLURP is available at


pdf bib
Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition
Rory Beard | Ritwik Das | Raymond W. M. Ng | P. G. Keerthana Gopalakrishnan | Luka Eerens | Pawel Swietojanski | Ondrej Miksik
Proceedings of the 22nd Conference on Computational Natural Language Learning

Natural human communication is nuanced and inherently multi-modal. Humans possess specialised sensoria for processing vocal, visual, and linguistic, and para-linguistic information, but form an intricately fused percept of the multi-modal data stream to provide a holistic representation. Analysis of emotional content in face-to-face communication is a cognitive task to which humans are particularly attuned, given its sociological importance, and poses a difficult challenge for machine emulation due to the subtlety and expressive variability of cross-modal cues. Inspired by the empirical success of recent so-called End-To-End Memory Networks and related works, we propose an approach based on recursive multi-attention with a shared external memory updated over multiple gated iterations of analysis. We evaluate our model across several large multi-modal datasets and show that global contextualised memory with gated memory update can effectively achieve emotion recognition.


pdf bib
The UEDIN ASR systems for the IWSLT 2014 evaluation
Peter Bell | Pawel Swietojanski | Joris Driesen | Mark Sinclair | Fergus McInnes | Steve Renals
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the University of Edinburgh (UEDIN) ASR systems for the 2014 IWSLT Evaluation. Notable features of the English system include deep neural network acoustic models in both tandem and hybrid configuration with the use of multi-level adaptive networks, LHUC adaptation and Maxout units. The German system includes lightly supervised training and a new method for dictionary generation. Our voice activity detection system now uses a semi-Markov model to incorporate a prior on utterance lengths. There are improvements of up to 30% relative WER on the tst2013 English test set.


pdf bib
The UEDIN systems for the IWSLT 2012 evaluation
Eva Hasler | Peter Bell | Arnab Ghoshal | Barry Haddow | Philipp Koehn | Fergus McInnes | Steve Renals | Pawel Swietojanski
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the University of Edinburgh (UEDIN) systems for the IWSLT 2012 Evaluation. We participated in the ASR (English), MT (English-French, German-English) and SLT (English-French) tracks.