We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: MEPetersen and MECAPTURE, which retrieve a single-valued assessment, and MESchnabel which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.
Language models trained with Maximum Likelihood Estimation (MLE) have been considered as a mainstream solution in Natural Language Generation (NLG) for years. Recently, various approaches with Generative Adversarial Nets (GANs) have also been proposed. While offering exciting new prospects, GANs in NLG by far are nevertheless reportedly suffering from training instability and mode collapse, and therefore outperformed by conventional MLE models. In this work, we propose techniques for improving GANs in NLG, namely Best Student Forcing (BSF), a novel yet simple adversarial training mechanism in which generated sequences of high quality are selected as temporary ground-truth to further train the generator. We also use an ensemble of discriminators to increase training stability and sample diversity. Evaluation shows that the combination of BSF and multiple discriminators consistently performs better than previous GAN approaches over various metrics, and outperforms a baseline MLE in terms of Fr ́ech ́et Distance, a recently proposed metric capturing both sample quality and diversity.
In this paper, we propose an alternative evaluating metric for word analogy questions (A to B is as C to D) in word vector evaluation. Different from the traditional method which predicts the fourth word by the given three, we measure the similarity directly on the “relations” of two pairs of given words, just as shifting the relation vectors into a new analogy space. Cosine and Euclidean distances are then calculated as measurements. Observation and experiments shows the proposed analogy space evaluation could offer a more comprehensive evaluating result on word vectors with word analogy questions. Meanwhile, computational complexity are remarkably reduced by avoiding traversing the vocabulary.
In this paper we propose an approach to predict punctuation marks for unsegmented speech transcript. The approach is purely lexical, with pre-trained Word Vectors as the only input. A training model of Deep Neural Network (DNN) or Convolutional Neural Network (CNN) is applied to classify whether a punctuation mark should be inserted after the third word of a 5-words sequence and which kind of punctuation mark the inserted one should be. TED talks within IWSLT dataset are used in both training and evaluation phases. The proposed approach shows its effectiveness by achieving better result than the state-of-the-art lexical solution which works with same type of data, especially when predicting puncuation position only.