Extracting semantic relations from Portuguese corpora using lexical-syntactic patterns

Raquel Amaro


Abstract
The growing investment on automatic extraction procedures, together with the need for extensive resources, makes semi-automatic construction a new viable and efficient strategy for developing of language resources, combining accuracy, size, coverage and applicability. These assumptions motivated the work depicted in this paper, aiming at the establishment and use of lexical-syntactic patterns for extracting semantic relations for Portuguese from corpora, part of a larger ongoing project for the semi-automatic extension of WordNet.PT. 26 lexical-syntactic patterns were established, covering hypernymy/hyponymy and holonymy/meronymy relations between nominal items, and over 34 000 contexts were manually analyzed to evaluate the productivity of each pattern. The set of patterns and respective examples are given, as well as data concerning the extraction of relations - right hits, wrong hits and related hits-, and the total of occurrences of each pattern in CPRC. Although language-dependent, and thus clearly of obvious interest for the development of lexical resources for Portuguese, the results depicted in this paper are also expected to be helpful as a basis for the establishment of patterns for related languages such as Spanish, Catalan, French or Italian.
Anthology ID:
L14-1690
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3001–3005
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/900_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Raquel Amaro. 2014. Extracting semantic relations from Portuguese corpora using lexical-syntactic patterns. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3001–3005, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Extracting semantic relations from Portuguese corpora using lexical-syntactic patterns (Amaro, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/900_Paper.pdf