Matteo Grella


pdf bib
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng | Eric Alcaide | Quentin Anthony | Alon Albalak | Samuel Arcadinho | Stella Biderman | Huanqi Cao | Xin Cheng | Michael Chung | Leon Derczynski | Xingjian Du | Matteo Grella | Kranthi Gv | Xuzheng He | Haowen Hou | Przemyslaw Kazienko | Jan Kocon | Jiaming Kong | Bartłomiej Koptyra | Hayden Lau | Jiaju Lin | Krishna Sri Ipsit Mantri | Ferdinand Mom | Atsushi Saito | Guangyu Song | Xiangru Tang | Johan Wind | Stanisław Woźniak | Zhenyuan Zhang | Qinghua Zhou | Jian Zhu | Rui-Jie Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.


pdf bib
UniMorph 3.0: Universal Morphology
Arya D. McCarthy | Christo Kirov | Matteo Grella | Amrit Nidhi | Patrick Xia | Kyle Gorman | Ekaterina Vylomova | Sabrina J. Mielke | Garrett Nicolai | Miikka Silfverberg | Timofey Arkhangelskiy | Nataly Krizhanovsky | Andrew Krizhanovsky | Elena Klyachko | Alexey Sorokin | John Mansfield | Valts Ernštreits | Yuval Pinter | Cassandra L. Jacobs | Ryan Cotterell | Mans Hulden | David Yarowsky
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.