Yueen Ma
2024
VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder
Yueen Ma
|
DaFeng Chi
|
Jingjing Li
|
Kai Song
|
Yuzheng Zhuang
|
Irwin King
Findings of the Association for Computational Linguistics: NAACL 2024
The natural language generation domain has witnessed great success thanks to Transformer models. Although they have achieved state-of-the-art generative quality, they often neglect generative diversity. Prior attempts to tackle this issue suffer from either low model capacity or over-complicated architectures. Some recent methods employ the VAE framework to enhance diversity, but their latent variables fully depend on the input context, restricting exploration of the latent space. In this paper, we introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE via a more effective cross-attention-based connection, departing from conventional embedding concatenation or summation. Additionally, we propose integrating InfoGAN-style latent codes to enable input-independent variability, further diversifying the generation. Moreover, our framework accommodates discrete inputs alongside its existing support for continuous inputs. We perform comprehensive experiments with two types of Transformers on six datasets from three different NLG tasks to show that our approach can significantly improve generative diversity while maintaining generative quality.
2022
NLP in Human Rights Research: Extracting Knowledge Graphs about Police and Army Units and Their Commanders
Daniel Bauer
|
Tom Longley
|
Yueen Ma
|
Tony Wilson
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
In this paper we explore the use of an NLP system to assist the work of Security Force Monitor (SFM). SFM creates data about the organizational structure, command personnel and operations of police, army and other security forces, which assists human rights researchers, journalists and litigators in their work to help identify and bring to account specific units and personnel alleged to have committed abuses of human rights and international criminal law. This paper presents an NLP system that extracts from English language news reports the names of security force units and the biographical details of their personnel, and infers the formal relationship between them. Published alongside this paper are the system’s code and training dataset. We find that the experimental NLP system performs the task at a fair to good level. Its performance is sufficient to justify further development into a live workflow that will give insight into whether its performance translates into savings in time and resource that would make it an effective technical intervention.
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
Yue Wan
|
Yueen Ma
|
Haoxuan You
|
Zhecan Wang
|
Shih-Fu Chang
Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)
Large-scale visual-linguistic pre-training aims to capture the generic representations from multimodal features, which are essential for downstream vision-language tasks. Existing methods mostly focus on learning the semantic connections between visual objects and linguistic content, which tend to be recognitionlevel information and may not be sufficient for commonsensical reasoning tasks like VCR. In this paper, we propose a novel commonsensical vision-language pre-training framework to bridge the gap. We first augment the conventional image-caption pre-training datasets with commonsense inferences from a visuallinguistic GPT-2. To pre-train models on image, caption and commonsense inferences together, we propose two new tasks: masked commonsense modeling (MCM) and commonsense type prediction (CTP). To reduce the shortcut effect between captions and commonsense inferences, we further introduce the domain-wise adaptive masking that dynamically adjusts the masking ratio. Experimental results on downstream tasks, VCR and VQA, show the improvement of our pre-training strategy over previous methods. Human evaluation also validates the relevance, informativeness, and diversity of the generated commonsense inferences. Overall, we demonstrate the potential of incorporating commonsense knowledge into the conventional recognition-level visual-linguistic pre-training.
Search
Co-authors
- Dafeng Chi 1
- Jingjing Li 1
- Kai Song 1
- Yuzheng Zhuang 1
- Irwin King 1
- show all...