AbstractThe Yes/No QA task (Clark et al., 2019) consists of “Yes” or “No” questions about a given context. However, in realistic scenarios, the information provided in the context is not always sufficient in order to answer the question. For example, given the context “She married a lawyer from New-York.”, we don’t know whether the answer to the question “Did she marry in New York?” is “Yes” or “No”. In this paper, we extend the Yes/No QA task, adding questions with an IDK answer, and show its considerable difficulty compared to the original 2-label task. For this purpose, we (i) enrich the BoolQ dataset (Clark et al., 2019) to include unanswerable questions and (ii) create out-of-domain test sets for the Yes/No/IDK QA task. We study the contribution of training on other Natural Language Understanding tasks. We focus in particular on Extractive QA (Rajpurkar et al., 2018) and Recognizing Textual Entailments (RTE; Dagan et al., 2013), analyzing the differences between 2 and 3 labels using the new data.