Data Collection Pipeline for Low-Resource Languages: A Case Study on Constructing a Tetun Text Corpus Gabriel de Jesus author Sérgio Sobral Nunes author 2024-05 text Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Nicoletta Calzolari editor Min-Yen Kan editor Veronique Hoste editor Alessandro Lenci editor Sakriani Sakti editor Nianwen Xue editor ELRA and ICCL Torino, Italia conference publication de-jesus-nunes-2024-data https://aclanthology.org/2024.lrec-main.390/ 2024-05 4368 4380