Yimeng Gu


2022

pdf bib
MMVAE at SemEval-2022 Task 5: A Multi-modal Multi-task VAE on Misogynous Meme Detection
Yimeng Gu | Ignacio Castro | Gareth Tyson
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Nowadays, memes have become quite common in day-to-day communications on social media platforms. They appear to be amusing, evoking and attractive to audiences. However, some memes containing malicious contents can be harmful to the targeted group and arouse public anger in the long run. In this paper, we study misogynous meme detection, a shared task in SemEval 2022 - Multimedia Automatic Misogyny Identification (MAMI). The challenge of misogynous meme detection is to co-represent multi-modal features. To tackle with this challenge, we propose a Multi-modal Multi-task Variational AutoEncoder (MMVAE) to learn an effective co-representation of visual and textual features in the latent space, and determine if the meme contains misogynous information and identify its fine-grained categories. Our model achieves 0.723 on sub-task A and 0.634 on sub-task B in terms of F1 scores. We carry out comprehensive experiments on our model’s architecture and show that our approach significantly outperforms several strong uni-modal and multi-modal approaches. Our code is released on github.

2021

pdf bib
Automating Claim Construction in Patent Applications: The CMUmine Dataset
Ozan Tonguz | Yiwei Qin | Yimeng Gu | Hyun Hannah Moon
Proceedings of the Natural Legal Language Processing Workshop 2021

Intellectual Property (IP) in the form of issued patents is a critical and very desirable element of innovation in high-tech. In this position paper, we explore the possibility of automating the legal task of Claim Construction in patent applications via Natural Language Processing (NLP) and Machine Learning (ML). To this end, we first create a large dataset known as CMUmine™and then demonstrate that, using NLP and ML techniques the Claim Construction in patent applications, a crucial legal task currently performed by IP attorneys, can be automated. To the best of our knowledge, this is the first public patent application dataset. Our results look very promising in automating the patent application process.