Tim Nieradzik
2021
Cross-Lingual Transfer with MAML on Trees
Jezabel Garcia
|
Federica Freddi
|
Jamie McGowan
|
Tim Nieradzik
|
Feng-Ting Liao
|
Ye Tian
|
Da-shan Shiu
|
Alberto Bernacchia
Proceedings of the Second Workshop on Domain Adaptation for NLP
In meta-learning, the knowledge learned from previous tasks is transferred to new ones, but this transfer only works if tasks are related. Sharing information between unrelated tasks might hurt performance, and it is unclear how to transfer knowledge across tasks that have a hierarchical structure. Our research extends a meta-learning model, MAML, by exploiting hierarchical task relationships. Our algorithm, TreeMAML, adapts the model to each task with a few gradient steps, but the adaptation follows the hierarchical tree structure: in each step, gradients are pooled across tasks clusters and subsequent steps follow down the tree. We also implement a clustering algorithm that generates the tasks tree without previous knowledge of the task structure, allowing us to make use of implicit relationships between the tasks. We show that TreeMAML successfully trains natural language processing models for cross-lingual Natural Language Inference by taking advantage of the language phylogenetic tree. This result is useful since most languages in the world are under-resourced and the improvement on cross-lingual transfer allows the internationalization of NLP models.
How does BERT process disfluency?
Ye Tian
|
Tim Nieradzik
|
Sepehr Jalali
|
Da-shan Shiu
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Natural conversations are filled with disfluencies. This study investigates if and how BERT understands disfluency with three experiments: (1) a behavioural study using a downstream task, (2) an analysis of sentence embeddings and (3) an analysis of the attention mechanism on disfluency. The behavioural study shows that without fine-tuning on disfluent data, BERT does not suffer significant performance loss when presented disfluent compared to fluent inputs (exp1). Analysis on sentence embeddings of disfluent and fluent sentence pairs reveals that the deeper the layer, the more similar their representation (exp2). This indicates that deep layers of BERT become relatively invariant to disfluency. We pinpoint attention as a potential mechanism that could explain this phenomenon (exp3). Overall, the study suggests that BERT has knowledge of disfluency structure. We emphasise the potential of using BERT to understand natural utterances without disfluency removal.
Search
Fix data
Co-authors
- Da-shan Shiu 2
- Ye Tian 2
- Alberto Bernacchia 1
- Federica Freddi 1
- Jezabel Garcia 1
- show all...