Qisheng Hu
2024
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems
Kaixin Li
|
Yuchen Tian
|
Qisheng Hu
|
Ziyang Luo
|
Zhiyong Huang
|
Jing Ma
Findings of the Association for Computational Linguistics: EMNLP 2024
Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available.
From Moments to Milestones: Incremental Timeline Summarization Leveraging Large Language Models
Qisheng Hu
|
Geonsik Moon
|
Hwee Tou Ng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Timeline summarization (TLS) is essential for distilling coherent narratives from a vast collection of texts, tracing the progression of events and topics over time. Prior research typically focuses on either event or topic timeline summarization, neglecting the potential synergy of these two forms. In this study, we bridge this gap by introducing a novel approach that leverages large language models (LLMs) for generating both event and topic timelines. Our approach diverges from conventional TLS by prioritizing event detection, leveraging LLMs as pseudo-oracles for incremental event clustering and the construction of timelines from a text stream. As a result, it produces a more interpretable pipeline. Empirical evaluation across four TLS benchmarks reveals that our approach outperforms the best prior published approaches, highlighting the potential of LLMs in timeline summarization for real-world applications.
Search
Co-authors
- Kaixin Li 1
- Yuchen Tian 1
- Ziyang Luo 1
- Zhiyong Huang 1
- Jing Ma 1
- show all...