Proofbusters at SemEval-2026 Task 11: Neuro-Symbolic Syllogistic Reasoning via LLM-Guided Structure Extraction and Deterministic Validation

Mohamed Ayman; Khaled Marzouk; Abdallah Mashaly; Ahmed Heriez

Proofbusters at SemEval-2026 Task 11: Neuro-Symbolic Syllogistic Reasoning via LLM-Guided Structure Extraction and Deterministic Validation

Mohamed Ayman, Khaled Marzouk, Abdallah Mashaly, Ahmed Heriez

Abstract

This paper presents the **Proofbusters** system for SemEval-2026 Task 11 (English syllogism validity classification). The task evaluates whether language models can perform *formal* syllogistic reasoning independent of semantic content—i.e., without being swayed by *belief bias* (judging arguments by plausibility or world knowledge instead of logical validity).The main idea is **symbolic abstraction**: before predicting validity, each syllogism is converted into a content-invariant logical form so the model reasons over structure rather than over concrete terms. Inspired by Euler’s abstraction in the Königsberg bridges problem (stripping away geography to reveal pure structure), the paper explores three abstraction strategies of increasing formal rigor:1. **Template abstraction** — Replace categorical terms with generic placeholders (e.g., x, y, z); keep syntax and quantifiers. Serves as a baseline (82.20% accuracy).2. **Symbolic OOP abstraction** — Map entities and relations into an object-oriented constraint graph with explicit tracking of supersets, disjoint sets, etc. (88.84% with Qwen-7B).3. **Set-theoretic abstraction** — Translate premises and conclusion into formal set notation (e.g., \(A \subseteq B\), \(A \cap B = \emptyset\)) and enforce *existential import* (\(A, B, C \neq \emptyset\)) to align with Aristotelian logic. The solver never sees the original natural-language terms.The system uses a **two-stage pipeline**: a **Formulation** stage (natural language → symbolic representation) and a **Solver** stage (validity judgment from symbols only). The set-theoretic variant, using Gemini Flash 2.5 for formulation and Gemini Pro 2.5 for solving, achieves **98.95% accuracy** with **2.13** total content effect (TCE) and an **overall score of 46.23**, substantially outperforming both task baselines and the other abstraction variants.The **conclusion** is that belief bias in LLMs is tied to semantic surface form: *explicit abstraction into mathematical set notation* sharply reduces plausibility-driven errors. Robust logical reasoning likely requires **architectural separation** between semantic parsing and formal inference, rather than prompt engineering alone. Remaining challenges include formulation errors (e.g., quantifier misclassification), multi-step constraint composition, and negation–inclusion interactions. Future work may combine the abstraction pipeline with formally verified theorem provers and extend it to multilingual or more complex multi-premise reasoning.

Anthology ID:: 2026.semeval-1.182
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1407–1415
Language:
URL:: https://aclanthology.org/2026.semeval-1.182/
DOI:
Bibkey:
Cite (ACL):: Mohamed Ayman, Khaled Marzouk, Abdallah Mashaly, and Ahmed Heriez. 2026. Proofbusters at SemEval-2026 Task 11: Neuro-Symbolic Syllogistic Reasoning via LLM-Guided Structure Extraction and Deterministic Validation. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1407–1415, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Proofbusters at SemEval-2026 Task 11: Neuro-Symbolic Syllogistic Reasoning via LLM-Guided Structure Extraction and Deterministic Validation (Ayman et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.182.pdf
Supplementarymaterial:: 2026.semeval-1.182.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data