The Company They Keep: Extracting Japanese Neologisms Using Language Patterns

James Breen, Timothy Baldwin, Francis Bond


Abstract
We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.
Anthology ID:
2018.gwc-1.19
Volume:
Proceedings of the 9th Global Wordnet Conference
Month:
January
Year:
2018
Address:
Nanyang Technological University (NTU), Singapore
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
163–171
Language:
URL:
https://aclanthology.org/2018.gwc-1.19
DOI:
Bibkey:
Cite (ACL):
James Breen, Timothy Baldwin, and Francis Bond. 2018. The Company They Keep: Extracting Japanese Neologisms Using Language Patterns. In Proceedings of the 9th Global Wordnet Conference, pages 163–171, Nanyang Technological University (NTU), Singapore. Global Wordnet Association.
Cite (Informal):
The Company They Keep: Extracting Japanese Neologisms Using Language Patterns (Breen et al., GWC 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.gwc-1.19.pdf