Anna Shvets
2023
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh Dhole | Varun Gangal | Sebastian Gehrmann | Aadesh Gupta | Zhenhao Li | Saad Mahamood | Abinaya Mahadiran | Simon Mille | Ashish Shrivastava | Samson Tan | Tongshang Wu | Jascha Sohl-Dickstein | Jinho Choi | Eduard Hovy | Ondřej Dušek | Sebastian Ruder | Sajant Anand | Nagender Aneja | Rabin Banjade | Lisa Barthe | Hanna Behnke | Ian Berlot-Attwell | Connor Boyle | Caroline Brun | Marco Antonio Sobrevilla Cabezudo | Samuel Cahyawijaya | Emile Chapuis | Wanxiang Che | Mukund Choudhary | Christian Clauss | Pierre Colombo | Filip Cornell | Gautier Dagan | Mayukh Das | Tanay Dixit | Thomas Dopierre | Paul-Alexis Dray | Suchitra Dubey | Tatiana Ekeinhor | Marco Di Giovanni | Tanya Goyal | Rishabh Gupta | Louanes Hamla | Sang Han | Fabrice Harel-Canada | Antoine Honoré | Ishan Jindal | Przemysław Joniak | Denis Kleyko | Venelin Kovatchev | Kalpesh Krishna | Ashutosh Kumar | Stefan Langer | Seungjae Ryan Lee | Corey James Levinson | Hualou Liang | Kaizhao Liang | Zhexiong Liu | Andrey Lukyanenko | Vukosi Marivate | Gerard de Melo | Simon Meoni | Maxine Meyer | Afnan Mir | Nafise Sadat Moosavi | Niklas Meunnighoff | Timothy Sum Hon Mun | Kenton Murray | Marcin Namysl | Maria Obedkova | Priti Oli | Nivranshu Pasricha | Jan Pfister | Richard Plant | Vinay Prabhu | Vasile Pais | Libo Qin | Shahab Raji | Pawan Kumar Rajpoot | Vikas Raunak | Roy Rinberg | Nicholas Roberts | Juan Diego Rodriguez | Claude Roux | Vasconcellos Samus | Ananya Sai | Robin Schmidt | Thomas Scialom | Tshephisho Sefara | Saqib Shamsi | Xudong Shen | Yiwen Shi | Haoyue Shi | Anna Shvets | Nick Siegel | Damien Sileo | Jamie Simon | Chandan Singh | Roman Sitelew | Priyank Soni | Taylor Sorensen | William Soto | Aman Srivastava | Aditya Srivatsa | Tony Sun | Mukund Varma | A Tabassum | Fiona Tan | Ryan Teehan | Mo Tiwari | Marie Tolkiehn | Athena Wang | Zijian Wang | Zijie Wang | Gloria Wang | Fuxuan Wei | Bryan Wilie | Genta Indra Winata | Xinyu Wu | Witold Wydmanski | Tianbao Xie | Usama Yaseen | Michael Yee | Jing Zhang | Yue Zhang
Northern European Journal of Language Technology, Volume 9
Kaustubh Dhole | Varun Gangal | Sebastian Gehrmann | Aadesh Gupta | Zhenhao Li | Saad Mahamood | Abinaya Mahadiran | Simon Mille | Ashish Shrivastava | Samson Tan | Tongshang Wu | Jascha Sohl-Dickstein | Jinho Choi | Eduard Hovy | Ondřej Dušek | Sebastian Ruder | Sajant Anand | Nagender Aneja | Rabin Banjade | Lisa Barthe | Hanna Behnke | Ian Berlot-Attwell | Connor Boyle | Caroline Brun | Marco Antonio Sobrevilla Cabezudo | Samuel Cahyawijaya | Emile Chapuis | Wanxiang Che | Mukund Choudhary | Christian Clauss | Pierre Colombo | Filip Cornell | Gautier Dagan | Mayukh Das | Tanay Dixit | Thomas Dopierre | Paul-Alexis Dray | Suchitra Dubey | Tatiana Ekeinhor | Marco Di Giovanni | Tanya Goyal | Rishabh Gupta | Louanes Hamla | Sang Han | Fabrice Harel-Canada | Antoine Honoré | Ishan Jindal | Przemysław Joniak | Denis Kleyko | Venelin Kovatchev | Kalpesh Krishna | Ashutosh Kumar | Stefan Langer | Seungjae Ryan Lee | Corey James Levinson | Hualou Liang | Kaizhao Liang | Zhexiong Liu | Andrey Lukyanenko | Vukosi Marivate | Gerard de Melo | Simon Meoni | Maxine Meyer | Afnan Mir | Nafise Sadat Moosavi | Niklas Meunnighoff | Timothy Sum Hon Mun | Kenton Murray | Marcin Namysl | Maria Obedkova | Priti Oli | Nivranshu Pasricha | Jan Pfister | Richard Plant | Vinay Prabhu | Vasile Pais | Libo Qin | Shahab Raji | Pawan Kumar Rajpoot | Vikas Raunak | Roy Rinberg | Nicholas Roberts | Juan Diego Rodriguez | Claude Roux | Vasconcellos Samus | Ananya Sai | Robin Schmidt | Thomas Scialom | Tshephisho Sefara | Saqib Shamsi | Xudong Shen | Yiwen Shi | Haoyue Shi | Anna Shvets | Nick Siegel | Damien Sileo | Jamie Simon | Chandan Singh | Roman Sitelew | Priyank Soni | Taylor Sorensen | William Soto | Aman Srivastava | Aditya Srivatsa | Tony Sun | Mukund Varma | A Tabassum | Fiona Tan | Ryan Teehan | Mo Tiwari | Marie Tolkiehn | Athena Wang | Zijian Wang | Zijie Wang | Gloria Wang | Fuxuan Wei | Bryan Wilie | Genta Indra Winata | Xinyu Wu | Witold Wydmanski | Tianbao Xie | Usama Yaseen | Michael Yee | Jing Zhang | Yue Zhang
Northern European Journal of Language Technology, Volume 9
Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based natural language (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguous to humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popular language models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases. The infrastructure, datacards, and robustness evaluation results are publicly available on GitHub for the benefit of researchers working on paraphrase generation, robustness analysis, and low-resource NLP.
2022
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.
2021
System Description for the CommonGen task with the POINTER model
Anna Shvets
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Anna Shvets
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
In a current experiment we were testing CommonGen dataset for structure-to-text task from GEM living benchmark with the constraint based POINTER model. POINTER represents a hybrid architecture, combining insertion-based and transformer paradigms, predicting the token and the insertion position at the same time. The text is therefore generated gradually in a parallel non-autoregressive manner, given the set of keywords. The pretrained model was fine-tuned on a training split of the CommonGen dataset and the generation result was compared to the validation and challenge splits. The received metrics outputs, which measure lexical equivalence, semantic similarity and diversity, are discussed in details in a present system description.
Search
Fix author
Co-authors
- Samuel Cahyawijaya 2
- Kaustubh Dhole 2
- Ondřej Dušek 2
- Sebastian Gehrmann 2
- Saad Mahamood 2
- Simon Mille 2
- Vikas Raunak 2
- Bryan Wilie 2
- Genta Indra Winata 2
- Tosin Adewumi 1
- Pawan Sasanka Ammanamanchi 1
- Sajant Anand 1
- Nagender Aneja 1
- Rabin Banjade 1
- Lisa Barthe 1
- Hanna Behnke 1
- Ian Berlot-Attwell 1
- Chandra Bhagavatula 1
- Abhik Bhattacharjee 1
- Bernd Bohnet 1
- Connor Boyle 1
- Caroline Brun 1
- Ronald Cardenas 1
- Khyathi Raghavi Chandu 1
- Emile Chapuis 1
- Wanxiang Che 1
- Jenny Chim 1
- Jinho D. Choi 1
- Mukund Choudhary 1
- Elizabeth Clark 1
- Christian Clauss 1
- Jordan Clive 1
- Pierre Colombo 1
- Filip Cornell 1
- Mathias Creutz 1
- Gautier Dagan 1
- Nico Daheim 1
- Mayukh Das 1
- Gerard De Melo 1
- Daniel Deutsch 1
- Marco Di Giovanni 1
- Tanay Dixit 1
- Thomas Dopierre 1
- Paul-Alexis Dray 1
- Suchitra Dubey 1
- Esin Durmus 1
- Moussa Kamal Eddine 1
- Tatiana Ekeinhor 1
- Varun Gangal 1
- Cristina Garbacea 1
- Filip Ginter 1
- Dimitra Gkatzia 1
- Tanya Goyal 1
- Aadesh Gupta 1
- Rishabh Gupta 1
- Louanes Hamla 1
- Sang Han 1
- Fabrice Harel-Canada 1
- Tahmid Hasan 1
- Hiroaki Hayashi 1
- Antoine Honoré 1
- Yufang Hou 1
- Eduard Hovy 1
- Yacine Jernite 1
- Di Jin 1
- Ishan Jindal 1
- Shailza Jolly 1
- Przemysław Joniak 1
- Juraj Juraska 1
- Mihir Sanjay Kale 1
- Jenna Kanerva 1
- Denis Kleyko 1
- Venelin Kovatchev 1
- Kalpesh Krishna 1
- Reno Kriz 1
- Ashutosh Kumar 1
- Faisal Ladhak 1
- Stefan Langer 1
- Seungjae Ryan Lee 1
- Corey James Levinson 1
- Zhenhao Li 1
- Paul Pu Liang 1
- Hualou Liang 1
- Kaizhao Liang 1
- Yixin Liu 1
- Zhexiong Liu 1
- Andrey Lukyanenko 1
- Aman Madaan 1
- Abinaya Mahadiran 1
- Abinaya Mahendiran 1
- Vukosi Marivate 1
- Joshua Maynez 1
- Angelina McMillan-Major 1
- Simon Meoni 1
- Niklas Meunnighoff 1
- Maxine Meyer 1
- Afnan Mir 1
- Sebastien Montella 1
- Nafise Sadat Moosavi 1
- Timothy Sum Hon Mun 1
- Kenton Murray 1
- Marcin Namysl 1
- Vitaly Nikolaev 1
- Jekaterina Novikova 1
- Maria Obedkova 1
- Priti Oli 1
- Salomey Osei 1
- Vasile Pais 1
- Alexandros Papangelis 1
- Nivranshu Pasricha 1
- Laura Perez-Beltrachini 1
- Jan Pfister 1
- Richard Plant 1
- Vinay Prabhu 1
- Ratish Puduppully 1
- Mahim Pushkarna 1
- Libo Qin 1
- Dragomir Radev 1
- Vipul Raheja 1
- Shahab Raji 1
- Pawan Kumar Rajpoot 1
- Leonardo F. R. Ribeiro 1
- Roy Rinberg 1
- Nicholas Roberts 1
- Juan Diego Rodriguez 1
- Claude Roux 1
- Sebastian Ruder 1
- Ananya Sai 1
- Vasconcellos Samus 1
- Yisi Sang 1
- Robin Schmidt 1
- Thomas Scialom 1
- João Sedoc 1
- Tshephisho Sefara 1
- Rifat Shahriyar 1
- Saqib Shamsi 1
- Tianhao Shen 1
- Xudong Shen 1
- Yiwen Shi 1
- Freda Shi 1
- Ashish Shrivastava 1
- Nick Siegel 1
- Damien Sileo 1
- Jamie Simon 1
- Chandan Singh 1
- Roman Sitelew 1
- Marco Antonio Sobrevilla Cabezudo 1
- Jascha Sohl-Dickstein 1
- Priyank Soni 1
- Taylor Sorensen 1
- William Soto Martinez 1
- Aman Srivastava 1
- Aditya Srivatsa 1
- Hendrik Strobelt 1
- Nishant Subramani 1
- Tony Sun 1
- A Tabassum 1
- Samson Tan 1
- Fiona Tan 1
- Ryan Teehan 1
- Craig Thomson 1
- Mo Tiwari 1
- Marie Tolkiehn 1
- Vivian Tsai 1
- Lewis Tunstall 1
- Ashish Upadhyay 1
- Mukund Varma 1
- Alex Wang 1
- Dakuo Wang 1
- Athena Wang 1
- Zijian Wang 1
- Zijie Wang 1
- Gloria Wang 1
- Fuxuan Wei 1
- Michael White 1
- Tongshang Wu 1
- Xinyu Wu 1
- Witold Wydmanski 1
- Tianbao Xie 1
- Deyi Xiong 1
- Ying Xu 1
- Bingsheng Yao 1
- Usama Yaseen 1
- Michael Yee 1
- Chaobin You 1
- Li Zhang 1
- Jing Zhang 1
- Yue Zhang 1
- Jiawei Zhou 1
- Qi Zhu 1
- Sanja Štajner 1