Ma Longxuan


2023

pdf bib
Exploring Accurate and Generic Simile Knowledge from Pre-trained Language Models
Zhou Shuhan | Ma Longxuan | Shao Yanqiu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“A simile is an important linguistic phenomenon in daily communication and an important taskin natural language processing (NLP). In recent years, pre-trained language models (PLMs) haveachieved great success in NLP since they learn generic knowledge from a large corpus. However,PLMs still have hallucination problems that they could generate unrealistic or context-unrelatedinformation.In this paper, we aim to explore more accurate simile knowledge from PLMs.To this end, we first fine-tune a single model to perform three main simile tasks (recognition,interpretation, and generation). In this way, the model gains a better understanding of the simileknowledge. However, this understanding may be limited by the distribution of the training data. To explore more generic simile knowledge from PLMs, we further add semantic dependencyfeatures in three tasks. The semantic dependency feature serves as a global signal and helpsthe model learn simile knowledge that can be applied to unseen domains. We test with seenand unseen domains after training. Automatic evaluations demonstrate that our method helps thePLMs to explore more accurate and generic simile knowledge for downstream tasks. Our methodof exploring more accurate knowledge is not only useful for simile study but also useful for otherNLP tasks leveraging knowledge from PLMs. Our code and data will be released on GitHub.”

pdf bib
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Zhuang Ziyu | Chen Qiguang | Ma Longxuan | Li Mingda | Han Yi | Qian Yushan | Bai Haopeng | Zhang Weinan | Ting Liu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)

“From pre-trained language model (PLM) to large language model (LLM), the field of naturallanguage processing (NLP) has witnessed steep performance gains and wide practical uses. Theevaluation of a research field guides its direction of improvement. However, LLMs are extremelyhard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inade-quate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficultto keep up with the wide range of applications in real-world scenarios. To tackle these problems,existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerousevaluation tasks in both academia and industry, we investigate multiple papers concerning LLMevaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, relia-bility, and safety. For every competency, we introduce its definition, corresponding benchmarks,and metrics. Under this competency architecture, similar tasks are combined to reflect corre-sponding ability, while new tasks can also be easily added into the system. Finally, we give oursuggestions on the future direction of LLM’s evaluation.”