Todor Mihaylov


2020

pdf bib
EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Momchil Hardalov | Todor Mihaylov | Dimitrina Zlatkova | Yoan Dinkov | Ivan Koychev | Preslav Nakov
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose EXAMS – a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others.EXAMS offers unique fine-grained evaluation framework across multiple languages and subjects, which allows precise analysis and comparison of the proposed models. We perform various experiments with existing top-performing multilingual pre-trained models and show that EXAMS offers multiple challenges that require multilingual knowledge and reasoning in multiple domains. We hope that EXAMS will enable researchers to explore challenging reasoning and knowledge transfer methods and pre-trained models for school question answering in various languages which was not possible by now. The data, code, pre-trained models, and evaluation are available at http://github.com/mhardalov/exams-qa.

2019

pdf bib
Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension
Todor Mihaylov | Anette Frank
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this work, we propose to use linguistic annotations as a basis for a Discourse-Aware Semantic Self-Attention encoder that we employ for reading comprehension on narrative texts. We extract relations between discourse units, events, and their arguments as well as coreferring mentions, using available annotation tools. Our empirical evaluation shows that the investigated structures improve the overall performance (up to +3.4 Rouge-L), especially intra-sentential and cross-sentential discourse relations, sentence-internal semantic role relations, and long-distance coreference relations. We show that dedicating self-attention heads to intra-sentential relations and relations connecting neighboring sentences is beneficial for finding answers to questions in longer contexts. Our findings encourage the use of discourse-semantic annotations to enhance the generalization capacity of self-attention models for reading comprehension.

2018

pdf bib
Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge
Todor Mihaylov | Anette Frank
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a neural reading comprehension model that integrates external commonsense knowledge, encoded as a key-value memory, in a cloze-style setting. Instead of relying only on document-to-question interaction or discrete features as in prior work, our model attends to relevant external knowledge and combines this knowledge with the context representation before inferring the answer. This allows the model to attract and imply knowledge from an external knowledge source that is not explicitly stated in the text, but that is relevant for inferring the answer. Our model improves results over a very strong baseline on a hard Common Nouns dataset, making it a strong competitor of much more complex models. By including knowledge explicitly, our model can also provide evidence about the background knowledge used in the RC process.

pdf bib
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov | Peter Clark | Tushar Khot | Ashish Sabharwal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a new kind of question answering dataset, OpenBookQA, modeled after open book exams for assessing human understanding of a subject. The open book that comes with our questions is a set of 1326 elementary level science facts. Roughly 6000 questions probe an understanding of these facts and their application to novel situations. This requires combining an open book fact (e.g., metals conduct electricity) with broad common knowledge (e.g., a suit of armor is made of metal) obtained from other sources. While existing QA datasets over documents or knowledge bases, being generally self-contained, focus on linguistic understanding, OpenBookQA probes a deeper understanding of both the topic—in the context of common knowledge—and the language it is expressed in. Human performance on OpenBookQA is close to 92%, but many state-of-the-art pre-trained QA methods perform surprisingly poorly, worse than several simple neural baselines we develop. Our oracle experiments designed to circumvent the knowledge retrieval bottleneck demonstrate the value of both the open book and additional facts. We leave it as a challenge to solve the retrieval problem in this multi-hop setting and to close the large gap to human performance.

2017

pdf bib
Story Cloze Ending Selection Baselines and Data Examination
Todor Mihaylov | Anette Frank
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

This paper describes two supervised baseline systems for the Story Cloze Test Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using features based on word embeddings and semantic similarity computation. We further implement a neural LSTM system with different encoding strategies that try to model the relation between the story and the provided endings. Our experiments show that a model using representation features based on average word embedding vectors over the given story words and the candidate ending sentences words, joint with similarity features between the story and candidate ending representations performed better than the neural models. Our best model based on achieves an accuracy of 72.42, ranking 3rd in the official evaluation.

2016

pdf bib
Hunting for Troll Comments in News Community Forums
Todor Mihaylov | Preslav Nakov
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Discourse Relation Sense Classification Using Cross-argument Semantic Similarity Based on Word Embeddings
Todor Mihaylov | Anette Frank
Proceedings of the CoNLL-16 shared task

pdf bib
SUper Team at SemEval-2016 Task 3: Building a Feature-Rich System for Community Question Answering
Tsvetomila Mihaylova | Pepa Gencheva | Martin Boyanov | Ivana Yovcheva | Todor Mihaylov | Momchil Hardalov | Yasen Kiprov | Daniel Balchev | Ivan Koychev | Preslav Nakov | Ivelina Nikolova | Galia Angelova
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
SemanticZ at SemEval-2016 Task 3: Ranking Relevant Answers in Community Question Answering Using Semantic Similarity Based on Fine-tuned Word Embeddings
Todor Mihaylov | Preslav Nakov
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Exposing Paid Opinion Manipulation Trolls
Todor Mihaylov | Ivan Koychev | Georgi Georgiev | Preslav Nakov
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Finding Opinion Manipulation Trolls in News Community Forums
Todor Mihaylov | Georgi Georgiev | Preslav Nakov
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2014

pdf bib
SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter
Boris Velichkov | Borislav Kapukaranov | Ivan Grozev | Jeni Karanesheva | Todor Mihaylov | Yasen Kiprov | Preslav Nakov | Ivan Koychev | Georgi Georgiev
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)