Dmytro Okhonko


pdf bib
UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering
Barlas Oguz | Xilun Chen | Vladimir Karpukhin | Stan Peshterliev | Dmytro Okhonko | Michael Schlichtkrull | Sonal Gupta | Yashar Mehdad | Scott Yih
Findings of the Association for Computational Linguistics: NAACL 2022

We study open-domain question answering with structured, unstructured and semi-structured knowledge sources, including text, tables, lists and knowledge bases. Departing from prior work, we propose a unifying approach that homogenizes all sources by reducing them to text and applies the retriever-reader model which has so far been limited to text sources only. Our approach greatly improves the results on knowledge-base QA tasks by 11 points, compared to latest graph-based methods. More importantly, we demonstrate that our unified knowledge (UniK-QA) model is a simple and yet effective way to combine heterogeneous sources of knowledge, advancing the state-of-the-art results on two popular question answering benchmarks, NaturalQuestions and WebQuestions, by 3.5 and 2.6 points, respectively.The code of UniK-QA is available at:

pdf bib
CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training
Patrick Huber | Armen Aghajanyan | Barlas Oguz | Dmytro Okhonko | Scott Yih | Sonal Gupta | Xilun Chen
Findings of the Association for Computational Linguistics: NAACL 2022

We propose a novel open-domain question-answering dataset based on the Common Crawl project. With a previously unseen number of around 130 million multilingual question-answer pairs (including about 60 million English data-points), we use our large-scale, natural, diverse and high-quality corpus to in-domain pre-train popular language models for the task of question-answering. In our experiments, we find that our Common Crawl Question Answering dataset (CCQA) achieves promising results in zero-shot, low resource and fine-tuned settings across multiple tasks, models and benchmarks.


pdf bib
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu | Gargi Ghosh | Po-Yao Huang | Dmytro Okhonko | Armen Aghajanyan | Florian Metze | Luke Zettlemoyer | Christoph Feichtenhofer
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval. Our experiments on a diverse series of downstream tasks, including sequence-level text-video retrieval, VideoQA, token-level action localization, and action segmentation reveal state-of-the-art performance, surpassing prior work, and in some cases even outperforming supervised approaches. Code is made available at


pdf bib
Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq
Changhan Wang | Yun Tang | Xutai Ma | Anne Wu | Dmytro Okhonko | Juan Pino
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T is available at