Workshop on Building and Using Comparable Corpora (2024)
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024
Pierre Zweigenbaum
|
Reinhard Rapp
|
Serge Sharoff
On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
Guillem Ramírez
|
Rumen Dangovski
|
Preslav Nakov
|
Marin Soljacic
Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal
Julius Steuer
|
Marie-Pauline Krielke
|
Stefan Fischer
|
Stefania Degaetano-Ortlieb
|
Marius Mosbach
|
Dietrich Klakow
PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tomás Freitas Osório
|
Bernardo Leite
|
Henrique Lopes Cardoso
|
Luís Gomes
|
João Rodrigues
|
Rodrigo Santos
|
António Branco
Invited Talk: The Way Towards Massively Multilingual Language Models
François Yvon
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
|
ZhenHao Tang
|
Xianghua Fu
|
Jian Chen
|
Shilong Hou
|
Jinze Lyu
Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles
Abdelhadi Soudi
|
Mohamed Hannani
|
Kristof Van Laerhoven
|
Eleftherios Avramidis
INCLURE: a Dataset and Toolkit for Inclusive French Translation
Paul Lerner
|
Cyril Grouin
BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha
|
Zeshan Ahmed Nobin
|
Mufassir Ahmad Chowdhury
|
Md. Shakirul Hasan Khan Mobin
|
Mohammad Ruhul Amin
|
Sudipta Kar
Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity
Anna Laskina
|
Eric Gaussier
|
Gaelle Calvary
EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Marc Kupietz
|
Piotr Banski
|
Nils Diewald
|
Beata Trawinski
|
Andreas Witt
Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish
Neslihan Cesur
|
Aslı Kuzgun
|
Mehmet Kose
|
Olcay Taner Yıldız
Bootstrapping the Annotation of UD Learner Treebanks
Arianna Masciolini
SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish
Felix Morger
Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek
|
Giedrė Valūnaitė Oleškevičienė
|
Amalia Mendes
mini-CIEP+ : A Shareable Parallel Corpus of Prose
Annemarie Verkerk
|
Luigi Talamo