TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages

Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, Kai Yu


Abstract
Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests. Although previous SRC work has leveraged extra information such as HTML tags or XPaths, the informative topology of web pages is not effectively exploited. In this work, we propose a Topological Information Enhanced model (TIE), which transforms the token-level task into a tag-level task by introducing a two-stage process (i.e. node locating and answer refining). Based on that, TIE integrates Graph Attention Network (GAT) and Pre-trained Language Model (PLM) to leverage the topological information of both logical structures and spatial structures. Experimental results demonstrate that our model outperforms strong baselines and achieves state-of-the-art performances on the web-based SRC benchmark WebSRC at the time of writing. The code of TIE will be publicly available at https://github.com/X-LANCE/TIE.
Anthology ID:
2022.naacl-main.132
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1808–1821
Language:
URL:
https://aclanthology.org/2022.naacl-main.132
DOI:
10.18653/v1/2022.naacl-main.132
Bibkey:
Cite (ACL):
Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, and Kai Yu. 2022. TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1808–1821, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages (Zhao et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.132.pdf
Video:
 https://aclanthology.org/2022.naacl-main.132.mp4
Code
 x-lance/tie