Vivian Tsai
2025
To Mask or to Mirror: Human-AI Alignment in Collective Reasoning
Crystal Qian | Aaron T Parisi | Clémentine Bouleau | Vivian Tsai | Maël Lebreton | Lucas Dixon
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Crystal Qian | Aaron T Parisi | Clémentine Bouleau | Vivian Tsai | Maël Lebreton | Lucas Dixon
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
As large language models (LLMs) are increasingly used to model and augment collective decision-making, it is critical to examine their alignment with human social reasoning. We present an empirical framework for assessing collective alignment, in contrast to prior work on the individual level. Using the Lost at Sea social psychology task, we conduct a large-scale online experiment (N=748), randomly assigning groups to leader elections with either visible demographic attributes (e.g. name, gender) or pseudonymous aliases. We then simulate matched LLM groups conditioned on the human data, benchmarking Gemini 2.5, GPT-4.1, Claude Haiku 3.5, and Gemma 3. LLM behaviors diverge: some mirror human biases; others mask these biases and attempt to compensate for them. We empirically demonstrate that human-AI alignment in collective reasoning depends on context, cues, and model-specific inductive biases. Understanding how LLMs align with collective human behavior is critical to advancing socially-aligned AI, and demands dynamic benchmarks that capture the complexities of collective reasoning.
2022
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.
Search
Fix author
Co-authors
- Tosin Adewumi 1
- Pawan Sasanka Ammanamanchi 1
- Chandra Bhagavatula 1
- Abhik Bhattacharjee 1
- Bernd Bohnet 1
- Clémentine Bouleau 1
- Samuel Cahyawijaya 1
- Ronald Cardenas 1
- Khyathi Raghavi Chandu 1
- Jenny Chim 1
- Elizabeth Clark 1
- Jordan Clive 1
- Mathias Creutz 1
- Nico Daheim 1
- Daniel Deutsch 1
- Kaustubh Dhole 1
- Lucas Dixon 1
- Esin Durmus 1
- Ondřej Dušek 1
- Moussa Kamal Eddine 1
- Cristina Garbacea 1
- Sebastian Gehrmann 1
- Filip Ginter 1
- Dimitra Gkatzia 1
- Tahmid Hasan 1
- Hiroaki Hayashi 1
- Yufang Hou 1
- Yacine Jernite 1
- Di Jin 1
- Shailza Jolly 1
- Juraj Juraska 1
- Mihir Sanjay Kale 1
- Jenna Kanerva 1
- Reno Kriz 1
- Faisal Ladhak 1
- Maël Lebreton 1
- Paul Pu Liang 1
- Yixin Liu 1
- Aman Madaan 1
- Saad Mahamood 1
- Abinaya Mahendiran 1
- Joshua Maynez 1
- Angelina McMillan-Major 1
- Simon Mille 1
- Sebastien Montella 1
- Vitaly Nikolaev 1
- Jekaterina Novikova 1
- Salomey Osei 1
- Alexandros Papangelis 1
- Aaron T Parisi 1
- Laura Perez-Beltrachini 1
- Ratish Puduppully 1
- Mahim Pushkarna 1
- Crystal Qian 1
- Dragomir Radev 1
- Vipul Raheja 1
- Vikas Raunak 1
- Leonardo F. R. Ribeiro 1
- Yisi Sang 1
- João Sedoc 1
- Rifat Shahriyar 1
- Tianhao Shen 1
- Anna Shvets 1
- Hendrik Strobelt 1
- Nishant Subramani 1
- Craig Thomson 1
- Lewis Tunstall 1
- Ashish Upadhyay 1
- Alex Wang 1
- Dakuo Wang 1
- Michael White 1
- Bryan Wilie 1
- Genta Indra Winata 1
- Deyi Xiong 1
- Ying Xu 1
- Bingsheng Yao 1
- Chaobin You 1
- Li Zhang 1
- Jiawei Zhou 1
- Qi Zhu 1
- Sanja Štajner 1