Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

Subhash Pujari; Jannik Strötgen; Mark Giereth; Michael Gertz; Annemarie Friedrich

doi:10.18653/v1/2022.emnlp-main.791

Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, Annemarie Friedrich

Abstract

Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.

Anthology ID:: 2022.emnlp-main.791
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11498–11513
Language:
URL:: https://aclanthology.org/2022.emnlp-main.791/
DOI:: 10.18653/v1/2022.emnlp-main.791
Bibkey:
Cite (ACL):: Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, and Annemarie Friedrich. 2022. Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11498–11513, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping (Pujari et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.791.pdf
Video:: https://aclanthology.org/2022.emnlp-main.791.mp4

PDF Cite Search Video Fix data