Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons

You Zuo, Benoît Sagot, Kim Gerdes, Houda Mouzoun, Samir Ghamri Doudane


Abstract
This paper proposes a novel approach to French patent classification leveraging data-centric strategies. We compare different approaches for the two deepest levels of the IPC hierarchy: the IPC group and subgroups. Our experiments show that while simple ensemble strategies work for shallower levels, deeper levels require more sophisticated techniques such as data augmentation, clustering, and negative sampling. Our research highlights the importance of language-specific features and data-centric strategies for accurate and reliable French patent classification. It provides valuable insights and solutions for researchers and practitioners in the field of patent classification, advancing research in French patent classification.
Anthology ID:
2023.jeptalnrecital-long.27
Volume:
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs
Month:
6
Year:
2023
Address:
Paris, France
Editors:
Christophe Servan, Anne Vilnat
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
349–365
Language:
URL:
https://aclanthology.org/2023.jeptalnrecital-long.27
DOI:
Bibkey:
Cite (ACL):
You Zuo, Benoît Sagot, Kim Gerdes, Houda Mouzoun, and Samir Ghamri Doudane. 2023. Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons. In Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs, pages 349–365, Paris, France. ATALA.
Cite (Informal):
Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons (Zuo et al., JEP/TALN/RECITAL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.jeptalnrecital-long.27.pdf