Workshop on Building and Using Comparable Corpora (2024)


up

pdf (full)
bib (full)
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024

pdf bib
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024
Pierre Zweigenbaum | Reinhard Rapp | Serge Sharoff

pdf bib
On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
Guillem Ramírez | Rumen Dangovski | Preslav Nakov | Marin Soljacic

pdf bib
Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal
Julius Steuer | Marie-Pauline Krielke | Stefan Fischer | Stefania Degaetano-Ortlieb | Marius Mosbach | Dietrich Klakow

pdf bib
PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tomás Freitas Osório | Bernardo Leite | Henrique Lopes Cardoso | Luís Gomes | João Rodrigues | Rodrigo Santos | António Branco

pdf bib
Invited Talk: The Way Towards Massively Multilingual Language Models
François Yvon

pdf bib
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long | ZhenHao Tang | Xianghua Fu | Jian Chen | Shilong Hou | Jinze Lyu

pdf bib
Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles
Abdelhadi Soudi | Mohamed Hannani | Kristof Van Laerhoven | Eleftherios Avramidis

pdf bib
INCLURE: a Dataset and Toolkit for Inclusive French Translation
Paul Lerner | Cyril Grouin

pdf bib
BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha | Zeshan Ahmed Nobin | Mufassir Ahmad Chowdhury | Md. Shakirul Hasan Khan Mobin | Mohammad Ruhul Amin | Sudipta Kar

pdf bib
Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity
Anna Laskina | Eric Gaussier | Gaelle Calvary

pdf bib
EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Marc Kupietz | Piotr Banski | Nils Diewald | Beata Trawinski | Andreas Witt

pdf bib
Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish
Neslihan Cesur | Aslı Kuzgun | Mehmet Kose | Olcay Taner Yıldız

pdf bib
Bootstrapping the Annotation of UD Learner Treebanks
Arianna Masciolini

pdf bib
SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish
Felix Morger

pdf bib
Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek | Giedrė Valūnaitė Oleškevičienė | Amalia Mendes

pdf bib
mini-CIEP+ : A Shareable Parallel Corpus of Prose
Annemarie Verkerk | Luigi Talamo