Mengsong Wu
Also published as: MengSong Wu
2025
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Han Han
|
Tong Zhu
|
Xiang Zhang
|
MengSong Wu
|
Xiong Hao
|
Wenliang Chen
Proceedings of the 31st International Conference on Computational Linguistics
Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.
2023
Mirror: A Universal Framework for Various Information Extraction Tasks
Tong Zhu
|
Junfei Ren
|
Zijian Yu
|
Mengsong Wu
|
Guoliang Zhang
|
Xiaoye Qu
|
Wenliang Chen
|
Zhefeng Wang
|
Baoxing Huai
|
Min Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations. Meanwhile, this divergence leads to information waste and increases difficulties in building complex applications in real scenarios. Recent studies often formulate IE tasks as a triplet extraction problem. However, such a paradigm does not support multi-span and n-ary extraction, leading to weak versatility. To this end, we reorganize IE problems into unified multi-slot tuples and propose a universal framework for various IE tasks, namely Mirror. Specifically, we recast existing IE tasks as a multi-span cyclic graph extraction problem and devise a non-autoregressive graph decoding algorithm to extract all spans in a single step. It is worth noting that this graph structure is incredibly versatile, and it supports not only complex IE tasks, but also machine reading comprehension and classification tasks. We manually construct a corpus containing 57 datasets for model pretraining, and conduct experiments on 30 datasets across 8 downstream tasks. The experimental results demonstrate that our model has decent compatibility and outperforms or reaches competitive performance with SOTA systems under few-shot and zero-shot settings. The code, model weights, and pretraining corpus are available at https://github.com/Spico197/Mirror .
Search
Fix data
Co-authors
- Wenliang Chen (陈文亮) 2
- Tong Zhu (朱桐) 2
- Han Han 1
- Xiong Hao 1
- Baoxing Huai 1
- show all...