Yixuan Ma


2024

pdf bib
Causal-Guided Active Learning for Debiasing Large Language Models
Zhouhao Sun | Li Du | Xiao Ding | Yixuan Ma | Yang Zhao | Kaitao Qiu | Ting Liu | Bing Qin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although achieving promising performance, recent analyses show that current generative large language models (LLMs) may still capture dataset biases and utilize them for generation, leading to poor generalizability and harmfulness of LLMs. However, due to the diversity of dataset biases and the over-optimization problem, previous prior-knowledge-based debiasing methods and fine-tuning-based debiasing methods may not be suitable for current LLMs.To address this issue, we explore combining active learning with the causal mechanisms and propose a casual-guided active learning (CAL) framework, which utilizes LLMs itself to automatically and autonomously identify informative biased samples and induce the bias patterns. Then a cost-effective and efficient in-context learning based method is employed to prevent LLMs from utilizing dataset biases during generation.Experimental results show that CAL can effectively recognize typical biased instances and induce various bias patterns for debiasing LLMs.

2022

pdf bib
中文专利关键信息语料库的构建研究(Research on the construction of Chinese patent key information corpus)
Wenting Zhang (张文婷) | Meihan Zhao (赵美含) | Yixuan Ma (马翊轩) | Wenrui Wang (王文瑞) | Yuzhe Liu (刘宇哲) | Muyun Yang (杨沐昀)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“专利文献是一种重要的技术文献,是知识产权强国的重要工作内容。目前专利语料库多集中于信息检索、机器翻译以及文本文分类等领域,尚缺乏更细粒度的标注,不足以支持问答、阅读理解等新形态的人工智能技术研发。本文面向专利智能分析的需要,提出了从解决问题、技术手段、效果三个角度对发明专利进行专利标注,并最终构建了包含313篇的中文专利关键信息语料库。利用命名实体识别技术对语料库关键信息进行识别和验证,表明专利关键信息的识别是不同于领域命名实体识别的更大粒度的信息抽取难题。”