Yew-Soon Ong


2024

pdf bib
Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding
Heng Zhao | Zhao Yinjie | Bihan Wen | Yew-Soon Ong | Joey Tianyi Zhou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Weakly-supervised Spatio-Temporal Video Grounding(STVG) aims to localize target object tube given a text query, without densely annotated training data. Existing methods extract each candidate tube feature independently by cropping objects from video frame feature, discarding all contextual information such as position change and inter-entity relationship. In this paper, we propose Video-Text Prompting(VTP) to construct candidate feature. Instead of cropping tube region from feature map, we draw visual markers(e.g. red circle) over objects tubes as video prompts; corresponding text prompt(e.g. in red circle) is also inserted after the subject word of query text to highlight its presence. Nevertheless, each candidate feature may look similar without cropping. To address this, we further propose Contrastive VTP(CVTP) by introducing negative contrastive samples whose candidate object is erased instead of being highlighted; by comparing the difference between VTP candidate and the contrastive sample, the gap of matching score between correct candidate and the rest is enlarged. Extensive experiments and ablations are conducted on several STVG datasets and our results surpass existing weakly-supervised methods by a great margin, demonstrating the effectiveness of our proposed methods.

2020

pdf bib
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
Alvin Chan | Yi Tay | Yew-Soon Ong | Aston Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a ‘backdoor poisoning’ attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier’s predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.