Ziyi Huang
2022
Unraveling the Mystery of Artifacts in Machine Generated Text
Jiashu Pu
|
Ziyi Huang
|
Yadong Xi
|
Guandan Chen
|
Weijie Chen
|
Rongsheng Zhang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
As neural Text Generation Models (TGM) have become more and more capable of generating text indistinguishable from human-written ones, the misuse of text generation technologies can have serious ramifications. Although a neural classifier often achieves high detection accuracy, the reason for it is not well studied. Most previous work revolves around studying the impact of model structure and the decoding strategy on ease of detection, but little work has been done to analyze the forms of artifacts left by the TGM. We propose to systematically study the forms and scopes of artifacts by corrupting texts, replacing them with linguistic or statistical features, and applying the interpretable method of Integrated Gradients. Comprehensive experiments show artifacts a) primarily relate to token co-occurrence, b) feature more heavily at the head of vocabulary, c) appear more in content word than stopwords, d) are sometimes detrimental in the form of number of token occurrences, e) are less likely to exist in high-level semantics or syntaxes, f) manifest in low concreteness values for higher-order n-grams.
2021
基于序列到序列的中文AMR解析(Chinese AMR Parsing based on Sequence-to-Sequence Modeling)
Ziyi Huang (黄子怡)
|
Junhui Li (李军辉)
|
Zhengxian Gong (贡正仙)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
抽象语义表示(Abstract Meaning Representation,简称AMR)是将给定的文本的语义特征抽象成一个单根的有向无环图。AMR语义解析则是根据输入的文本获取对应的AMR图。相比于英文AMR,中文AMR的研究起步较晚,造成针对中文的AMR语义解析相关研究较少。本文针对公开的中文AMR语料库CAMR1.0,采用序列到序列的方法进行中文AMR语义解析的相关研究。具体地,首先基于Transformer模型实现一个适用于中文的序列到序列AMR语义解析系统;然后,探索并比较了不同预训练模型在中文AMR语义解析中的应用。基于该语料,本文中文AMR语义解析方法最优性能达到了70.29的Smatch F1值。本文是第一次在该数据集上报告实验结果。
Search
Co-authors
- Jiashu Pu 1
- Yadong Xi 1
- Guandan Chen 1
- Weijie Chen 1
- Rongsheng Zhang 1
- show all...