Haiyang Yu
Other people with similar names: Haiyang Yu
Unverified author pages with similar names: Haiyang Yu
2026
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Haiyang Yu | Yuchuan Wu | Fan Shi | Jinghui Lu | Ke Niu | Xiaodong Ge | Minghan Zhuo | Jingqun Tang | Bin Li
Findings of the Association for Computational Linguistics: ACL 2026
Haiyang Yu | Yuchuan Wu | Fan Shi | Jinghui Lu | Ke Niu | Xiaodong Ge | Minghan Zhuo | Jingqun Tang | Bin Li
Findings of the Association for Computational Linguistics: ACL 2026
Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding—traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual/linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring.
2025
A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu | Haiyang Yu | Yanjie Wang | Yongjie Ye | Jingqun Tang | Ziwei Yang | Binghong Wu | Qi Liu | Hao Feng | Han Wang | Hao Liu | Can Huang
Findings of the Association for Computational Linguistics: ACL 2025
Jinghui Lu | Haiyang Yu | Yanjie Wang | Yongjie Ye | Jingqun Tang | Ziwei Yang | Binghong Wu | Qi Liu | Hao Feng | Han Wang | Hao Liu | Can Huang
Findings of the Association for Computational Linguistics: ACL 2025
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In this work, we introduce Interleaving Layout andText in a Large Language Model (LayTextLLM) for document understanding. LayTextLLM projects each bounding box to a single embedding and interleaves it with text, efficiently avoiding long sequence issues while leveraging autoregressive traits of LLMs. LayTextLLM not only streamlines the interaction of layout and textual data but also shows enhanced performance in KIE and VQA. Comprehensive benchmark evaluations reveal significant improvements of LayTextLLM, with a 15.2% increase on KIE tasks and 10.7% on VQA tasks compared to previous SOTA OCR-based LLMs. All resources are available at URL masked for anonymous review.