FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web

Cheng-Wen Lin, Wan-Hsuan Hsieh, Kai-Xin Guan, Chan-Jan Hsu, Chia-Chen Kuo, Chuan-Lin Lai, Chung-Wei Chung, Ming-Jen Wang, Da-Shan Shiu


Anthology ID:
2024.rocling-1.16
Volume:
Proceedings of the 36th Conference on Computational Linguistics and Speech Processing (ROCLING 2024)
Month:
November
Year:
2024
Address:
Taipei City, Taiwan
Editors:
Shu-Chuan Tseng, Yu Tsao, Hen-Hsen Huang, Yao-Chung Fan, Chia-Hui Chang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
129–136
Language:
URL:
https://aclanthology.org/2024.rocling-1.16/
DOI:
Bibkey:
Cite (ACL):
Cheng-Wen Lin, Wan-Hsuan Hsieh, Kai-Xin Guan, Chan-Jan Hsu, Chia-Chen Kuo, Chuan-Lin Lai, Chung-Wei Chung, Ming-Jen Wang, and Da-Shan Shiu. 2024. FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web. In Proceedings of the 36th Conference on Computational Linguistics and Speech Processing (ROCLING 2024), pages 129–136, Taipei City, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web (Lin et al., ROCLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.rocling-1.16.pdf