Embedding Open-domain Common-sense Knowledge from Text

Travis Goodwin, Sanda Harabagiu


Abstract
Our ability to understand language often relies on common-sense knowledge ― background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.
Anthology ID:
L16-1732
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4621–4628
Language:
URL:
https://aclanthology.org/L16-1732
DOI:
Bibkey:
Cite (ACL):
Travis Goodwin and Sanda Harabagiu. 2016. Embedding Open-domain Common-sense Knowledge from Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4621–4628, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Embedding Open-domain Common-sense Knowledge from Text (Goodwin & Harabagiu, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1732.pdf
Data
DBpedia