Reducing tokenizer’s tokens per word ratio in Financial domain with T-MuFin BERT Tokenizer

Braulio Blanco Lambruschini, Patricia Becerra-Sanchez, Mats Brorsson, Maciej Zurad


Anthology ID:
2023.finnlp-1.9
Volume:
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting
Month:
20 August
Year:
2023
Address:
Macao
Editors:
Chung-Chi Chen, Hiroya Takamura, Puneet Mathur, Remit Sawhney, Hen-Hsen Huang, Hsin-Hsi Chen
Venues:
FinNLP | WS
SIG:
Publisher:
-
Note:
Pages:
94–103
Language:
URL:
https://aclanthology.org/2023.finnlp-1.9
DOI:
Bibkey:
Cite (ACL):
Braulio Blanco Lambruschini, Patricia Becerra-Sanchez, Mats Brorsson, and Maciej Zurad. 2023. Reducing tokenizer’s tokens per word ratio in Financial domain with T-MuFin BERT Tokenizer. In Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting, pages 94–103, Macao. -.
Cite (Informal):
Reducing tokenizer’s tokens per word ratio in Financial domain with T-MuFin BERT Tokenizer (Lambruschini et al., FinNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.finnlp-1.9.pdf