Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition

Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou


Abstract
This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm. We use Burst Information Networks as media to represent text streams and present a simple yet effective network decipherment algorithm with diverse clues to decipher the networks for accurate text stream alignment. Experiments on Chinese-English news streams show our approach not only outperforms previous approaches on bilingual lexicon extraction from coordinated text streams but also can harvest high-quality alignments from large amounts of streaming data for endless language knowledge mining, which makes it promising to be a new paradigm for automatic language knowledge acquisition.
Anthology ID:
D18-1271
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2496–2506
Language:
URL:
https://aclanthology.org/D18-1271/
DOI:
10.18653/v1/D18-1271
Bibkey:
Cite (ACL):
Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, and Ming Zhou. 2018. Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2496–2506, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition (Ge et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1271.pdf
Attachment:
 D18-1271.Attachment.zip
Video:
 https://aclanthology.org/D18-1271.mp4