BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples

Mohaddeseh Bastan, Mihai Surdeanu, Niranjan Balasubramanian


Abstract
Natural language inference (NLI) is critical in many domains requiring complex decision-making, such as the biomedical domain. We introduce a novel semi-supervised procedure that bootstraps biomedical NLI datasets from positive entailment examples present in abstracts of biomedical publications. We focus on challenging texts where the hypothesis includes mechanistic information such as biochemical interactions between two entities. A key contribution of this work is automating the creation of negative examples that are informative without being simplistic. We generate a range of negative examples using nine strategies that manipulate the structure of the underlying mechanisms both with rules, e.g., flip the roles of the entities in the interaction, and, more importantly, by imposing the perturbed conditions as logical constraints in a neuro-logical decoding system (CITATION).We use this procedure to create a novel dataset for NLI in the biomedical domain, called . The accuracy of neural classifiers on this dataset is in the mid 70s F1, which indicates that this NLI task remains to be solved. Critically, we observe that the performance on the different classes of negative examples varies widely, from 97% F1 on the simple negative examples that change the role of the entities in the hypothesis, to barely better than chance on the negative examples generated using neuro-logic decoding.
Anthology ID:
2022.findings-emnlp.374
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5093–5104
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.374
DOI:
10.18653/v1/2022.findings-emnlp.374
Bibkey:
Cite (ACL):
Mohaddeseh Bastan, Mihai Surdeanu, and Niranjan Balasubramanian. 2022. BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5093–5104, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples (Bastan et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.374.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.374.mp4