Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data

Amir Hazem; Mérième Bouhandi; Florian Boudin; Béatrice Daille

Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data

Amir Hazem, Merieme Bouhandi, Florian Boudin, Beatrice Daille

Abstract

Automatic Term Extraction (ATE) is a key component for domain knowledge understanding and an important basis for further natural language processing applications. Even with persistent improvements, ATE still exhibits weak results exacerbated by small training data inherent to specialized domain corpora. Recently, transformers-based deep neural models, such as BERT, have proven to be efficient in many downstream NLP tasks. However, no systematic evaluation of ATE has been conducted so far. In this paper, we run an extensive study on fine-tuning pre-trained BERT models for ATE. We propose strategies that empirically show BERT’s effectiveness using cross-lingual and cross-domain transfer learning to extract single and multi-word terms. Experiments have been conducted on four specialized domains in three languages. The obtained results suggest that BERT can capture cross-domain and cross-lingual terminologically-marked contexts shared by terms, opening a new design-pattern for ATE.

Anthology ID:: 2022.lrec-1.68
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 648–662
Language:
URL:: https://aclanthology.org/2022.lrec-1.68/
DOI:
Bibkey:
Cite (ACL):: Amir Hazem, Merieme Bouhandi, Florian Boudin, and Beatrice Daille. 2022. Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 648–662, Marseille, France. European Language Resources Association.
Cite (Informal):: Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data (Hazem et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.68.pdf

PDF Cite Search Fix data