Yajing Yang
2024
DataTales: A Benchmark for Real-World Intelligent Data Narration
Yajing Yang
|
Qian Liu
|
Min-Yen Kan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.
2022
Lightweight Contextual Logical Structure Recovery
Po-Wei Huang
|
Abhinav Ramesh Kashyap
|
Yanxia Qin
|
Yajing Yang
|
Min-Yen Kan
Proceedings of the Third Workshop on Scholarly Document Processing
Logical structure recovery in scientific articles associates text with a semantic section of the article. Although previous work has disregarded the surrounding context of a line, we model this important information by employing line-level attention on top of a transformer-based scientific document processing pipeline. With the addition of loss function engineering and data augmentation techniques with semi-supervised learning, our method improves classification performance by 10% compared to a recent state-of-the-art model. Our parsimonious, text-only method achieves a performance comparable to that of other works that use rich document features such as font and spatial position, using less data without sacrificing performance, resulting in a lightweight training pipeline.