Learning a Grammar Inducer from Massive Uncurated Instructional Videos

Songyang Zhang; Linfeng Song; Lifeng Jin; Haitao Mi; Kun Xu; Dong Yu; Jiebo Luo

doi:10.18653/v1/2022.emnlp-main.16

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

Abstract

Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text. While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence. Such data can be found in abundance online, and the weak correspondence is similar to the indeterminacy problem studied in language acquisition. Furthermore, we build a new model that can better learn video-span correlation without manually designed features adopted by previous work. Experiments show that our model trained only on large-scale YouTube data with no text-video alignment reports strong and robust performances across three unseen datasets, despite domain shift and noisy label issues. Furthermore our model yields higher F1 scores than the previous state-of-the-art systems trained on in-domain data.

Anthology ID:: 2022.emnlp-main.16
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 233–247
Language:
URL:: https://aclanthology.org/2022.emnlp-main.16
DOI:: 10.18653/v1/2022.emnlp-main.16
Bibkey:
Cite (ACL):: Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, and Jiebo Luo. 2022. Learning a Grammar Inducer from Massive Uncurated Instructional Videos. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 233–247, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Learning a Grammar Inducer from Massive Uncurated Instructional Videos (Zhang et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.16.pdf

PDF Cite Search