Zheng Hu
2025
CycleOIE: A Low-Resource Training Framework For Open Information Extraction
Zhihong Jin
|
Chunhong Zhang
|
Zheng Hu
|
Jibin Yu
|
Ruiqi Ma
|
Qingyun Chen
|
Xiaohao Liao
|
Yanxing Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Open Information Extraction (OpenIE) aims to extract structured information in the form of triples from unstructured text, serving as a foundation for various downstream NLP tasks. Despite the success of neural OpenIE models, their dependence on large-scale annotated datasets poses a challenge, particularly in low-resource settings. In this paper, we introduce a novel approach to address the low-resource OpenIE task through two key innovations: (1) we improve the quality of training data by curating small-scale, high-quality datasets annotated by a large language model (GPT-3.5), leveraging both OpenIE principles and few-shot examples to form LSOIE-g principles and LSOIE-g examples; (2) we propose CycleOIE, a training framework that maximizes data efficiency through a cycle-consistency mechanism, enabling the model to learn effectively from minimal data. Experimental results show that CycleOIE, when trained on only 2k+ instances, achieves comparable results to models trained on over 90k instances. Our contributions are further validated through extensive experiments, demonstrating the superior performance of CycleOIE and our curated LSOIE-g datasets in low-resource OpenIE as well as revealing the internal mechanisms of CycleOIE.
2021
More than Text: Multi-modal Chinese Word Segmentation
Dong Zhang
|
Zheng Hu
|
Shoushan Li
|
Hanqian Wu
|
Qiaoming Zhu
|
Guodong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Chinese word segmentation (CWS) is undoubtedly an important basic task in natural language processing. Previous works only focus on the textual modality, but there are often audio and video utterances (such as news broadcast and face-to-face dialogues), where textual, acoustic and visual modalities normally exist. To this end, we attempt to combine the multi-modality (mainly the converted text and actual voice information) to perform CWS. In this paper, we annotate a new dataset for CWS containing text and audio. Moreover, we propose a time-dependent multi-modal interactive model based on Transformer framework to integrate multi-modal information for word sequence labeling. The experimental results on three different training sets show the effectiveness of our approach with fusing text and audio.
Search
Fix data
Co-authors
- Qingyun Chen 1
- Zhihong Jin 1
- Shoushan Li (李寿山) 1
- Xiaohao Liao 1
- Ruiqi Ma 1
- show all...