%0 Conference Proceedings %T Bazinga! A Dataset for Multi-Party Dialogues Structuring %A Lerner, Paul %A Bergoënd, Juliette %A Guinaudeau, Camille %A Bredin, Hervé %A Maurice, Benjamin %A Lefevre, Sharleyne %A Bouteiller, Martin %A Berhe, Aman %A Galmant, Léo %A Yin, Ruiqing %A Barras, Claude %Y Calzolari, Nicoletta %Y Béchet, Frédéric %Y Blache, Philippe %Y Choukri, Khalid %Y Cieri, Christopher %Y Declerck, Thierry %Y Goggi, Sara %Y Isahara, Hitoshi %Y Maegaard, Bente %Y Mariani, Joseph %Y Mazo, Hélène %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Thirteenth Language Resources and Evaluation Conference %D 2022 %8 June %I European Language Resources Association %C Marseille, France %F lerner-etal-2022-bazinga %X We introduce a dataset built around a large collection of TV (and movie) series. Those are filled with challenging multi-party dialogues. Moreover, TV series come with a very active fan base that allows the collection of metadata and accelerates annotation. With 16 TV and movie series, Bazinga! amounts to 400+ hours of speech and 8M+ tokens, including 500K+ tokens annotated with the speaker, addressee, and entity linking information. Along with the dataset, we also provide a baseline for speaker diarization, punctuation restoration, and person entity recognition. The results demonstrate the difficulty of the tasks and of transfer learning from models trained on mono-speaker audio or written text, which is more widely available. This work is a step towards better multi-party dialogue structuring and understanding. Bazinga! is available at hf.co/bazinga. Because (a large) part of Bazinga! is only partially annotated, we also expect this dataset to foster research towards self- or weakly-supervised learning methods. %U https://aclanthology.org/2022.lrec-1.367 %P 3434-3441