TUW-Inf at GermEval2021: Rule-based and Hybrid Methods for Detecting Toxic, Engaging, and Fact-Claiming Comments
Kinga Gémes | Gábor Recski
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
This paper describes our methods submitted for the GermEval 2021 shared task on identifying toxic, engaging and fact-claiming comments in social media texts (Risch et al., 2021). We explore simple strategies for semi-automatic generation of rule-based systems with high precision and low recall, and use them to achieve slight overall improvements over a standard BERT-based classifier.
BMEAUT at SemEval-2020 Task 2: Lexical Entailment with Semantic Graphs
Ádám Kovács | Kinga Gémes | Andras Kornai | Gábor Recski
Proceedings of the Fourteenth Workshop on Semantic Evaluation
In this paper we present a novel rule-based, language independent method for determining lexical entailment relations using semantic representations built from Wiktionary definitions. Combined with a simple WordNet-based method our system achieves top scores on the English and Italian datasets of the Semeval-2020 task “Predicting Multilingual and Cross-lingual (graded) Lexical Entailment” (Glavaš et al., 2020). A detailed error analysis of our output uncovers future di- rections for improving both the semantic parsing method and the inference process on semantic graphs.
BME-TUW at SR’20: Lexical grammar induction for surface realization
Gábor Recski | Ádám Kovács | Kinga Gémes | Judit Ács | Andras Kornai
Proceedings of the Third Workshop on Multilingual Surface Realisation
We present a system for mapping Universal Dependency structures to raw text which learns to restore word order by training an Interpreted Regular Tree Grammar (IRTG) that establishes a mapping between string and graph operations. The reinflection step is handled by a standard sequence-to-sequence architecture with a biLSTM encoder and an LSTM decoder with attention. We modify our 2019 system (Kovács et al., 2019) with a new grammar induction mechanism that allows IRTG rules to operate on lemmata in addition to part-of-speech tags and ensures that each word and its dependents are reordered using the most specific set of learned patterns. We also introduce a hierarchical approach to word order restoration that independently determines the word order of each clause in a sentence before arranging them with respect to the main clause, thereby improving overall readability and also making the IRTG parsing task tractable. We participated in the 2020 Surface Realization Shared task, subtrack T1a (shallow, closed). Human evaluation shows we achieve significant improvements on two of the three out-of-domain datasets compared to the 2019 system we modified. Both components of our system are available on GitHub under an MIT license.