Text Tokenization for Knowledge-free Automatic Extraction of Lexical Similarities

Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis


Abstract
Previous studies on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theory of contextual lexical semantics implies that larger segments of text, namely non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion applying automatic collocation extraction to identify and merge such multiwords prior to the similarity estimation process. Employing an automatic WordNet-based comparative evaluation scheme along with a manual evaluation procedure, we ascertain improvement of the extracted similarity relations.
Anthology ID:
2003.jeptalnrecital-poster.17
Volume:
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
Month:
June
Year:
2003
Address:
Batz-sur-Mer, France
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
397–402
Language:
URL:
https://aclanthology.org/2003.jeptalnrecital-poster.17
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2003.jeptalnrecital-poster.17.pdf