From Chinese Word Segmentation to Extraction of Constructions: Two Sides of the Same Algorithmic Coin

Jean-Pierre Colson


Abstract
This paper presents the results of two experiments carried out within the framework of computational construction grammar. Starting from the constructionist point of view that there are just constructions in language, including lexical ones, we tested the validity of a clustering algorithm that was primarily designed for MWE extraction, the cpr-score (Colson, 2017), on Chinese word segmentation. Our results indicate a striking recall rate of 75 percent without any special adaptation to Chinese or to the lexicon, which confirms that there is some similarity between extracting MWEs and CWS. Our second experiment also suggests that the same methodology might be used for extracting more schematic or abstract constructions, thereby providing evidence for the statistical foundation of construction grammar.
Anthology ID:
W18-4907
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Agata Savary, Carlos Ramisch, Jena D. Hwang, Nathan Schneider, Melanie Andresen, Sameer Pradhan, Miriam R. L. Petruck
Venues:
LAW | MWE
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–50
Language:
URL:
https://aclanthology.org/W18-4907
DOI:
Bibkey:
Cite (ACL):
Jean-Pierre Colson. 2018. From Chinese Word Segmentation to Extraction of Constructions: Two Sides of the Same Algorithmic Coin. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 41–50, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
From Chinese Word Segmentation to Extraction of Constructions: Two Sides of the Same Algorithmic Coin (Colson, LAW-MWE 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4907.pdf