### CODE for ARR Nov Submission 845: *CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment*


* **OVERVIEW:** This README along with the codes are only for reproducibility accessments during anonymous review. Full details will be provided in the public version.



* **Environment Setup**
	```
	bash setup.sh
	```


* **Visual Question Answering**
	```
	# Template Generation: 
		# T5: 
		    python vqa_template_T5.py
		# Parsing: 
		    python vqa_template_parsing.py

	# Answer Selection:
		python vqa_answer_selector_T5.py --T5

	# Few-shot CLIP Training:
		python vqa_clip_training.py --clip RN50x16 \
		--prompt_type T5_large_PROMPT \
		--number --yesno --other \
		--way_type question \
		--n_examples 4 --lr 2e-5 \
		--batch_size 8


	# Zero-shot Inference: 
		python vqa_clip_inference.py --clip ViT-B/16 \
		--yesno --number --other \
		--prompt_type T5_large_PROMPT \
		--portion 1 --partition 1 --mid 1 \
		--checkpoint DATA/MODELS/model_epoch_

	# Few-shot Inference:
		python vqa_clip_inference.py --clip RN50x16 \
		--yesno --number --other \
		--few-shot --prompt_type GT_T5_large_PROMPT \
		--portion 1 --partition 1 --mid 1 \
		--checkpoint DATA/MODELS/model_epoch_

	```
	
* **Visual Entailment**

	```
	# Language to Vision Transfering:
		python snlive_clip_transfering.py --identifier 1024-128-3-lang2vision \
		--lr 3e-6 --size1 1024 --size2 128 \
		--batch_size 64 --clip RN50x16
	# Evaluation:
		python snlive_clip_evaluation.py --identifier 1024_128_5e-6_bs128_dp0_lang2vion.pt \
		--clip RN50x16 --clip_size 768 --batch_size 128
	```
