This study investigates how fake news use the thumbnail image for a news article. We aim at capturing the degree of semantic incongruity between news text and image by using the pretrained CLIP representation. Motivated by the stylistic distinctiveness in fake news text, we examine whether fake news tends to use an irrelevant image to the news content. Results show that fake news tends to have a high degree of semantic incongruity than general news. We further attempt to detect such image-text incongruity by training classification models on a newly generated dataset. A manual evaluation suggests our method can find news articles of which the thumbnail image is semantically irrelevant to news text with an accuracy of 0.8. We also release a new dataset of image and news text pairs with the incongruity label, facilitating future studies on the direction.