Cross-lingual Alignment of Knowledge Graph Triples with Sentences

Swayatta Daw, Shivprasad Sagare, Tushar Abhishek, Vikram Pudi, Vasudeva Varma


Abstract
The pairing of natural language sentences with knowledge graph triples is essential for many downstream tasks like data-to-text generation, facts extraction from sentences (semantic parsing), knowledge graph completion, etc. Most existing methods solve these downstream tasks using neural-based end-to-end approaches that require a large amount of well-aligned training data, which is difficult and expensive to acquire. Recently various unsupervised techniques have been proposed to alleviate this alignment step by automatically pairing the structured data (knowledge graph triples) with textual data. However, these approaches are not well suited for low resource languages that provide two major challenges: (1) unavailability of pair of triples and native text with the same content distribution and (2) limited Natural language Processing (NLP) resources. In this paper, we address the unsupervised pairing of knowledge graph triples with sentences for low resource languages, selecting Hindi as the low resource language. We propose cross-lingual pairing of English triples with Hindi sentences to mitigate the unavailability of content overlap. We propose two novel approaches: NER-based filtering with Semantic Similarity and Key-phrase Extraction with Relevance Ranking. We use our best method to create a collection of 29224 well-aligned English triples and Hindi sentence pairs. Additionally, we have also curated 350 human-annotated golden test datasets for evaluation. We make the code and dataset publicly available.
Anthology ID:
2021.icon-main.77
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
629–637
Language:
URL:
https://aclanthology.org/2021.icon-main.77
DOI:
Bibkey:
Cite (ACL):
Swayatta Daw, Shivprasad Sagare, Tushar Abhishek, Vikram Pudi, and Vasudeva Varma. 2021. Cross-lingual Alignment of Knowledge Graph Triples with Sentences. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 629–637, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Cross-lingual Alignment of Knowledge Graph Triples with Sentences (Daw et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.77.pdf