A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content

Julia Maria Schulz; Daniela Becks; Christa Womser-Hacker; Thomas Mandl

A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content

Julia Maria Schulz, Daniela Becks, Christa Womser-Hacker, Thomas Mandl

Abstract

In order to extract meaningful phrases from corpora (e. g. in an information retrieval context) intensive knowledge of the domain in question and the respective documents is generally needed. When moving to a new domain or language the underlying knowledge bases and models need to be adapted, which is often time-consuming and labor-intensive. This paper adresses the described challenge of phrase extraction from documents in different domains and languages and proposes an approach, which does not use comprehensive lexica and therefore can be easily transferred to new domains and languages. The effectiveness of the proposed approach is evaluated on user generated content and documents from the patent domain in English and German.

Anthology ID:: L12-1249
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 538–543
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/466_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Julia Maria Schulz, Daniela Becks, Christa Womser-Hacker, and Thomas Mandl. 2012. A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 538–543, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content (Schulz et al., LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/466_Paper.pdf

PDF Cite Search Fix data