A mostly unlexicalized model for recognizing textual entailment

Mithun Paul; Rebecca Sharp; Mihai Surdeanu

doi:10.18653/v1/W18-5528

A mostly unlexicalized model for recognizing textual entailment

Mithun Paul, Rebecca Sharp, Mihai Surdeanu

Abstract

Many approaches to automatically recognizing entailment relations have employed classifiers over hand engineered lexicalized features, or deep learning models that implicitly capture lexicalization through word embeddings. This reliance on lexicalization may complicate the adaptation of these tools between domains. For example, such a system trained in the news domain may learn that a sentence like “Palestinians recognize Texas as part of Mexico” tends to be unsupported, but this fact (and its corresponding lexicalized cues) have no value in, say, a scientific domain. To mitigate this dependence on lexicalized information, in this paper we propose a model that reads two sentences, from any given domain, to determine entailment without using lexicalized features. Instead our model relies on features that are either unlexicalized or are domain independent such as proportion of negated verbs, antonyms, or noun overlap. In its current implementation, this model does not perform well on the FEVER dataset, due to two reasons. First, for the information retrieval portion of the task we used the baseline system provided, since this was not the aim of our project. Second, this is work in progress and we still are in the process of identifying more features and gradually increasing the accuracy of our model. In the end, we hope to build a generic end-to-end classifier, which can be used in a domain outside the one in which it was trained, with no or minimal re-training.

Anthology ID:: W18-5528
Volume:: Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)
Month:: November
Year:: 2018
Address:: Brussels, Belgium
Editors:: James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 166–171
Language:
URL:: https://aclanthology.org/W18-5528/
DOI:: 10.18653/v1/W18-5528
Bibkey:
Cite (ACL):: Mithun Paul, Rebecca Sharp, and Mihai Surdeanu. 2018. A mostly unlexicalized model for recognizing textual entailment. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 166–171, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: A mostly unlexicalized model for recognizing textual entailment (Paul et al., EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-5528.pdf

PDF Cite Search Fix data