Ahmet Üstün - ACL Anthology

Ahmet Üstün

2025

When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong | Tiancheng Hu | Yinhong Liu | Ahmet Üstün | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2025

While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority viewpoints.Although personalized preference learning addresses this by tailoring separate preferences for individual users, the field lacks standardized methods to assess its effectiveness. We present a multi-faceted evaluation framework that measures not only performance but also fairness, unintended effects, and adaptability across varying levels of preference divergence. Through extensive experiments comparing eight personalization methods across three preference datasets, we demonstrate that performance differences between methods could reach 36% when users strongly disagree, and personalization can introduce up to 20% safety misalignment. These findings highlight the critical need for holistic evaluation approaches to advance the development of more effective and inclusive preference learning systems.

Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts
Nikolas Gritsch | Qizhen Zhang | Acyr Locatelli | Sara Hooker | Ahmet Üstün
Findings of the Association for Computational Linguistics: EMNLP 2025

Frontier language models are increasingly based on the Mixture of Experts (MoE) architecture, boosting the efficiency of training and inference by sparsely activating parameters. Nevertheless, training from scratch on trillions of tokens remains so expensive that most users can only finetune these models. In this work, we combine parameter reuse of dense models for the MoE layers ("*upcycling*”) with a novel, *adaptive* Nexus router that can integrate new experts into an existing trained model without hurting the performance on previous domains. Our router leverages the knowledge of each expert’s training data distribution via domain embeddings to initialize the router, improving specialization and allowing it to adapt faster to new domains than a standard MoE router. Nexus overturns the strict sequential separation between training and finetuning in classical approaches, allowing more powerful improvements to existing models at a later stage through long token-horizon trainings on new pretraining data. Our experiments show that Nexus achieves a relative gain of up to 2.1% over the baseline for initial upcycling, and an 18.8% relative gain for extending the MoE to a new domain with a new expert by using limited finetuning data. This flexibility of Nexus can power an open-source ecosystem where every user continuously assembles their own MoE-mix from a multitude of dense models.

MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
Abdullatif Köksal | Marion Thaler | Ayyoob Imani | Ahmet Üstün | Anna Korhonen | Hinrich Schütze
Transactions of the Association for Computational Linguistics, Volume 13

Instruction tuning enhances large language models (LLMs) by aligning them with human preferences across diverse tasks. Traditional approaches to create instruction tuning datasets face serious challenges for low-resource languages due to their dependence on data annotation. This work introduces a novel method, Multilingual Reverse Instructions (MURI), which generates high-quality instruction tuning datasets for low-resource languages without requiring human annotators or pre-existing multilingual models. Utilizing reverse instructions and a translation pipeline, MURI produces instruction-output pairs from existing human-written texts in low-resource languages. This method ensures cultural relevance and diversity by sourcing texts from different native domains and applying filters to eliminate inappropriate content. Our dataset, MURI-IT, includes more than 2 million instruction-output pairs across 200 languages. Evaluation by native speakers and fine-tuning experiments with mT5 models demonstrate the approach’s effectiveness for both NLU and open-ended generation. We publicly release datasets and models at https://github.com/akoksal/muri.

We present Command A Translate, an LLMbased machine translation model built off Cohere’s Command A. It reaches state-of-the-art machine translation quality via direct preference optimization. Our meticulously designed data preparation pipeline emphasizes robust quality control and a novel difficulty filtering – a key innovation that distinguishes Command A Translate. Furthermore, we extend our model and participate at WMT with a system (CommandA-WMT) that uses two models and post-editing steps of step-by-step reasoning and limited Minimum Bayes Risk decoding.

2024

Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the fine-tuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. We worked with fluent speakers of languages from around the world to collect natural instances of instructions and completions. Furthermore, we create the most extensive multilingual collection to date, comprising 513 million instances through templating and augmenting existing datasets across 114 languages. In total, we contribute three key resources: we develop and open-source the Aya Dataset, the Aya Collection, and the Aya Evaluation Suite. The Aya initiative also serves as a valuable case study in participatory research, involving collaborators from 119 countries. We see this as an important framework for future research collaborations that aim to bridge gaps in resources.

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian | Chris Cremer | Matthias Gallé | Marzieh Fadaee | Julia Kreutzer | Olivier Pietquin | Ahmet Üstün | Sara Hooker
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been installed by the seminal literature as the standard method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that most of the motivational principles that led to the development of PPO are less of a practical concern in RLHF and advocate for a less computationally expensive method that preserves and even increases performance. We revisit how alignment from human preferences is formulated in the context of RL. Keeping simplicity as a guiding principle, we show that many components of PPO are unnecessary in an RLHF context and that far simpler REINFORCE-style optimization variants outperform both PPO and newly proposed “RL-free” methods such as DPO and RAFT. Our work suggests that careful adaptation to LLMs alignment characteristics allows benefiting from online RL optimization at low cost.

Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages —— including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models.

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang | Arash Ahmadian | Kelly Marchisio | Julia Kreutzer | Ahmet Üstün | Sara Hooker
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on a small set of high-resource languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art research transfer to a multilingual setting. In this work, we perform an exhaustive study to achieve a new state of the art in aligning multilingual LLMs. We introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. We establish the benefits of cross-lingual transfer and increased dataset size in preference training. Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in its parameter class, and a 69.5% win-rate or higher against widely used models like Gemma, Mistral and Llama 3. As a result of our efforts, we expand the frontier of alignment techniques to 23 languages, covering approximately half of the world’s population.

How Does Quantization Affect Multilingual LLMs?
Kelly Marchisio | Saurabh Dash | Hongyu Chen | Dennis Aumiller | Ahmet Üstün | Sara Hooker | Sebastian Ruder
Findings of the Association for Computational Linguistics: EMNLP 2024

Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, which automatic metrics severely underestimate: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks like mathematical reasoning degrade fastest. As the ability to serve low-compute models is critical for wide global adoption of NLP technologies, our results urge consideration of multilingual performance as a key evaluation criterion for efficient models.

Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Raúl Vázquez | Timothee Mickus | Jörg Tiedemann | Ivan Vulić | Ahmet Üstün
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)

2022

UDapter: Typology-based Language Adapters for Multilingual Dependency Parsing and Sequence Labeling
Ahmet Üstün | Arianna Bisazza | Gosse Bouma | Gertjan van Noord
Computational Linguistics, Volume 48, Issue 3 - September 2022

Recent advances in multilingual language modeling have brought the idea of a truly universal parser closer to reality. However, such models are still not immune to the “curse of multilinguality”: Cross-language interference and restrained model capacity remain major obstacles. To address this, we propose a novel language adaptation approach by introducing contextual language adapters to a multilingual parser. Contextual language adapters make it possible to learn adapters via language embeddings while sharing model parameters across languages based on contextual parameter generation. Moreover, our method allows for an easy but effective integration of existing linguistic typology features into the parsing model. Because not all typological features are available for every language, we further combine typological feature prediction with parsing in a multi-task model that achieves very competitive parsing performance without the need for an external prediction system for missing features. The resulting parser, UDapter, can be used for dependency parsing as well as sequence labeling tasks such as POS tagging, morphological tagging, and NER. In dependency parsing, it outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages, showing the success of the proposed adaptation approach. In sequence labeling tasks, our parser surpasses the baseline on high resource languages, and performs very competitively in a zero-shot setting. Our in-depth analyses show that adapter generation via typological features of languages is key to this success.1

When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Ahmet Üstün | Asa Cooper Stickland
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models. We find that ‘adapters’, in which small feed-forward networks are added after every layer, are indeed on par with full model fine-tuning when the parameter budget corresponds to 10% of total model parameters. Nevertheless, as the number of tuned parameters decreases, the performance of PEFTs decreases. The magnitude of this decrease depends on the language pair, with PEFTs particularly struggling for distantly related language-pairs. We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Ahmet Üstün | Arianna Bisazza | Gosse Bouma | Gertjan van Noord | Sebastian Ruder
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. It generates weights for adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge, our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best or competitive gain when a mixture of multiple resources is available, while on par with strong baseline in the standard scenario. Hyper-X is also considerably more efficient in terms of parameters and resources compared to methods that train separate adapters. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages, showing the versatility of our approach beyond zero-shot transfer.

2021

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions
Rob van der Goot | Ahmet Üstün | Barbara Plank
Proceedings of the Second Workshop on Domain Adaptation for NLP

Recent complementary strands of research have shown that leveraging information on the data source through encoding their properties into embeddings can lead to performance increase when training a single model on heterogeneous data sources. However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages and tasks. Furthermore, it is usually assumed that gold information on the data source is available, and that the test data is from a distribution seen during training. In this work, we compare the effect of dataset embeddings in mono-lingual settings, multi-lingual settings, and with predicted data source label in a zero-shot setting. We evaluate on three morphosyntactic tasks: morphological tagging, lemmatization, and dependency parsing, and use 104 datasets, 66 languages, and two different dataset grouping strategies. Performance increases are highest when the datasets are of the same language, and we know from which distribution the test-instance is drawn. In contrast, for setups where the data is from an unseen distribution, performance increase vanishes.

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP
Rob van der Goot | Ahmet Üstün | Alan Ramponi | Ibrahim Sharaf | Barbara Plank
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.

Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
Ahmet Üstün | Alexandre Berard | Laurent Besacier | Matthias Gallé
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this problem the standard procedure so far to leverage the monolingual data is _back-translation_, which is computationally costly and hard to tune. In this paper we propose instead to use _denoising adapters_, adapter layers with a denoising objective, on top of pre-trained mBART-50. In addition to the modularity and flexibility of such an approach we show that the resulting translations are on-par with back-translating as measured by BLEU, and furthermore it allows adding unseen languages incrementally.

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding
Rob van der Goot | Ibrahim Sharaf | Aizhan Imankulova | Ahmet Üstün | Marija Stepanović | Alan Ramponi | Siti Oryza Khairunnisa | Mamoru Komachi | Barbara Plank
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.

On the Difficulty of Translating Free-Order Case-Marking Languages
Arianna Bisazza | Ahmet Üstün | Stephan Sportel
Transactions of the Association for Computational Linguistics, Volume 9

Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin, or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate whether this class of languages is also more difficult to translate by state-of-the-art Neural Machine Translation (NMT) models. Using a variety of synthetic languages and a newly introduced translation challenge set, we find that word order flexibility in the source language only leads to a very small loss of NMT quality, even though the core verb arguments become impossible to disambiguate in sentences without semantic cues. The latter issue is indeed solved by the addition of case marking. However, in medium- and low-resource settings, the overall NMT quality of fixed-order languages remains unmatched.

Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language
Lukas Edman | Ahmet Üstün | Antonio Toral | Gertjan van Noord
Proceedings of the Sixth Conference on Machine Translation

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one. Our system uses a transformer encoder-decoder architecture in which we make three changes to the standard training procedure. First, our training focuses on two languages at a time, contrasting with a wealth of research on multilingual systems. Second, we introduce a novel method for initializing the vocabulary of an unseen language, achieving improvements of 3.2 BLEU for DE->DSB and 4.0 BLEU for DSB->DE.Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE->DSB by 2.76 BLEU. Our submissions ranked first (tied with another team) for DSB->DE and third for DE->DSB.

2020

UDapter: Language Adaptation for Truly Universal Dependency Parsing
Ahmet Üstün | Arianna Bisazza | Gosse Bouma | Gertjan van Noord
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent advances in multilingual dependency parsing have brought the idea of a truly universal parser closer to reality. However, cross-language interference and restrained model capacity remain major obstacles. To address this, we propose a novel multilingual task adaptation approach based on contextual parameter generation and adapter modules. This approach enables to learn adapters via language embeddings while sharing model parameters across languages. It also allows for an easy but effective integration of existing linguistic typology features into the parsing network. The resulting parser, UDapter, outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages, showing the success of the proposed adaptation approach. Our in-depth analyses show that soft parameter sharing via typological features is key to this success.

FiSSA at SemEval-2020 Task 9: Fine-tuned for Feelings
Bertelt Braaksma | Richard Scholtens | Stan van Suijlekom | Remy Wang | Ahmet Üstün
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data in the SemEval-2020 Task 9. We investigate performance of various pre-trained Transformer models by using different fine-tuning strategies. We explore both monolingual and multilingual models with the standard fine-tuning method. Additionally, we propose a custom model that we fine-tune in two steps: once with a language modeling objective, and once with a task-specific objective. Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score with 0.537 on development data and 0.739 on test data. With this score, our team jupitter placed tenth overall in the competition.

2019

There and Back Again: Cross-Lingual Transfer Learning for Event Detection
Tommaso Caselli | Ahmet Üstün
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Cross-Lingual Word Embeddings for Morphologically Rich Languages
Ahmet Üstün | Gosse Bouma | Gertjan van Noord
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of cross-lingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2% in the nearest neighbour ranking.

Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.
Ahmet Üstün | Rob van der Goot | Gosse Bouma | Gertjan van Noord
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context. Our model is a multi-task sequence to sequence neural network, which jointly learns morphological tagging and lemmatization. On the encoding side, we exploit character-level as well as contextual information. We introduce a multi-attention decoder to selectively focus on different parts of character and word sequences. To further improve the model, we train on multiple datasets simultaneously and use external embeddings for initialization. Our final model reaches an average morphological tagging F1 score of 94.54 and a lemma accuracy of 93.91 on the test data, ranking respectively 3rd and 6th out of 13 teams in the SIGMORPHON 2019 shared task.

2018

Characters or Morphemes: How to Represent Words?
Ahmet Üstün | Murathan Kurfalı | Burcu Can
Proceedings of the Third Workshop on Representation Learning for NLP

In this paper, we investigate the effects of using subword information in representation learning. We argue that using syntactic subword units effects the quality of the word representations positively. We introduce a morpheme-based model and compare it against to word-based, character-based, and character n-gram level models. Our model takes a list of candidate segmentations of a word and learns the representation of the word based on different segmentations that are weighted by an attention mechanism. We performed experiments on Turkish as a morphologically rich language and English with a comparably poorer morphology. The results show that morpheme-based models are better at learning word representations of morphologically complex languages compared to character-based and character n-gram level models since the morphemes help to incorporate more syntactic knowledge in learning, that makes morpheme-based models better at syntactic tasks.

Co-authors

Marzieh Fadaee 4

Rob Van Der Goot 4

Matthias Gallé 3

Kelly Marchisio 3

Barbara Plank 3

Sebastian Ruder 3

Arash Ahmadian 2

Alexandre Bérard 2

Daniel D’souza 2

Niklas Muennighoff 2

Ibrahim Sharaf 2

Shivalika Singh 2

Freddie Vargus 2

Aisha Alaagib 1

Emad Alghamdi 1

Zaid Alyafeai 1

Arkady Arkhangorodsky 1

Viraat Aryabumi 1

Dennis Aumiller 1

Laurent Besacier 1

Neel Bhandari 1

Bertelt Braaksma 1

Samuel Cahyawijaya 1

Tommaso Caselli 1

Nigel Collier 1

Asa Cooper Stickland 1

Yijiang River Dong 1

Hakimeh Fadaee 1

Nicholas Frosst 1

Sebastian Gehrmann 1

Nithya Govindarajan 1

Nikolas Gritsch 1

Surya Guthikonda 1

Ramith Hettiarachchi 1

Aizhan Imankulova 1

Börje F. Karlsson 1

Siti Oryza Khairunnisa 1

Mamoru Komachi 1

Anna Korhonen 1

Dominik Krzemiński 1

Murathan Kurfali 1

Abdullatif Köksal 1

Acyr Locatelli 1

Shayne Longpre 1

Marina Machado 1

Abinaya Mahendiran 1

Deividas Mataciunas 1

Timothee Mickus 1

Oshan Mudannayake 1

Gbemileke Onilude 1

Laura O’Mahony 1

Olivier Pietquin 1

Richard Scholtens 1

Hinrich Schütze 1

Herumb Shandilya 1

Stephan Sportel 1

Marija Stepanović 1

Marion Thaler 1

Jörg Tiedemann 1

Antonio Toral 1

Sebastian Vincent 1

Raúl Vázquez 1

Joseph Wilson 1

Stan van Suijlekom 1

Venues