Linguistic Resources for Phrasal Verb Identification

Peter Machonis


Abstract
This paper shows how a Lexicon-Grammar dictionary of English phrasal verbs (PV) can be transformed into an electronic dictionary, and with the help of multiple grammars, dictionaries, and filters within the linguistic development environment, NooJ, how to accurately identify PV in large corpora. The NooJ program is an alternative to statistical methods commonly used in NLP: all PV are listed in a dictionary and then located by means of a PV grammar in both continuous and discontinuous format. Results are then refined with a series of dictionaries, disambiguating grammars, and other linguistics recourses. The main advantage of such a program is that all PV can be identified in any corpus. The only drawback is that PV not listed in the dictionary (e.g., archaic forms, recent neologisms) are not identified; however, new PV can easily be added to the electronic dictionary, which is freely available to all.
Anthology ID:
W18-3804
Volume:
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | LR4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–27
Language:
URL:
https://aclanthology.org/W18-3804
DOI:
Bibkey:
Cite (ACL):
Peter Machonis. 2018. Linguistic Resources for Phrasal Verb Identification. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 18–27, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Linguistic Resources for Phrasal Verb Identification (Machonis, 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3804.pdf