Zheng Ye


pdf bib
Towards Imperceptible Document Manipulations against Neural Ranking Models
Xuanang Chen | Ben He | Zheng Ye | Le Sun | Yingfei Sun
Findings of the Association for Computational Linguistics: ACL 2023

Adversarial attacks have gained traction in order to identify vulnerabilities in neural ranking models (NRMs), but current attack methods often introduce noticeable errors. Moreover, current methods rely heavily on using a well-imitated surrogate NRM to guarantee the attack effect, making them difficult to use in practice. This paper proposes a framework called Imperceptible DocumEnt Manipulation (IDEM) to produce adversarial documents that are less noticeable to both algorithms and humans. IDEM instructs a well-established generative language model like BART to generate error-free connection sentences, and employs a separate position-wise merging strategy to balance between relevance and coherence of the perturbed text. Evaluation results on the MS MARCO benchmark demonstrate that IDEM outperforms strong baselines while preserving fluency and correctness of the target documents. Furthermore, the separation of adversarial text generation from the surrogate NRM makes IDEM more robust and less affected by the quality of the surrogate NRM.


pdf bib
基于相似度进行句子选择的机器阅读理解数据增强(Machine reading comprehension data Augmentation for sentence selection based on similarity)
Shuang Nie (聂双) | Zheng Ye (叶正) | Jun Qin (覃俊) | Jing Liu (刘晶)
Proceedings of the 21st Chinese National Conference on Computational Linguistics



pdf bib
Towards Quantifiable Dialogue Coherence Evaluation
Zheng Ye | Liucun Lu | Lishan Huang | Liang Lin | Xiaodan Liang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic dialogue coherence evaluation has attracted increasing attention and is crucial for developing promising dialogue systems. However, existing metrics have two major limitations: (a) they are mostly trained in a simplified two-level setting (coherent vs. incoherent), while humans give Likert-type multi-level coherence scores, dubbed as “quantifiable”; (b) their predicted coherence scores cannot align with the actual human rating standards due to the absence of human guidance during training. To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards. Specifically, QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning. During MLR pre-training, a new MLR loss is proposed for enabling the model to learn the coarse judgement of coherence degrees. Then, during KD fine-tuning, the pretrained model is further finetuned to learn the actual human rating standards with only very few human-annotated data. To advocate the generalizability even with limited fine-tuning data, a novel KD regularization is introduced to retain the knowledge learned at the pre-training stage. Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.


pdf bib
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang | Zheng Ye | Jinghui Qin | Liang Lin | Xiaodan Liang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Automatically evaluating dialogue coherence is a challenging but high-demand ability for developing high-quality open-domain dialogue systems. However, current evaluation metrics consider only surface features or utterance-level semantics, without explicitly considering the fine-grained topic transition dynamics of dialogue flows. Here, we first consider that the graph structure constituted with topics in a dialogue can accurately depict the underlying communication logic, which is a more natural way to produce persuasive metrics. Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. The graph representations are obtained by reasoning over topic-level dialogue graphs enhanced with the evidence from a commonsense graph, including k-hop neighboring representations and hop-attention weights. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models in terms of the Pearson and Spearman correlations with human judgments. Besides, we release a new large-scale human evaluation benchmark to facilitate future research on automatic metrics.


pdf bib
Natural Language Comprehension with the EpiReader
Adam Trischler | Zheng Ye | Xingdi Yuan | Philip Bachman | Alessandro Sordoni | Kaheer Suleman
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
Adam Trischler | Zheng Ye | Xingdi Yuan | Jing He | Philip Bachman
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)