András Kornai

Also published as: Andras Kornai

2021

Evaluating Transferability of BERT Models on Uralic Languages
Judit Ács | Dániel Lévai | Andras Kornai
Proceedings of the Seventh International Workshop on Computational Linguistics of Uralic Languages

pdf bib abs

Subword Pooling Makes a Difference
Judit Ács | Ákos Kádár | Andras Kornai
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Contextual word-representations became a standard in modern natural language processing systems. These models use subword tokenization to handle large vocabularies and unknown words. Word-level usage of such systems requires a way of pooling multiple subwords that correspond to a single word. In this paper we investigate how the choice of subword pooling affects the downstream performance on three tasks: morphological probing, POS tagging and NER, in 9 typologically diverse languages. We compare these in two massively multilingual models, mBERT and XLM-RoBERTa. For morphological tasks, the widely used ‘choose the first subword’ is the worst strategy and the best results are obtained by using attention over the subwords. For POS tagging both of these strategies perform poorly and the best choice is to use a small LSTM over the subwords. The same strategy works best for NER and we show that mBERT is better than XLM-RoBERTa in all 9 languages. We publicly release all code, data and the full result tables at https://github.com/juditacs/subword-choice .

2020

pdf bib abs

Better Together: Modern Methods Plus Traditional Thinking in NP Alignment
Ádám Kovács | Judit Ács | Andras Kornai | Gábor Recski
Proceedings of the Twelfth Language Resources and Evaluation Conference

We study a typical intermediary task to Machine Translation, the alignment of NPs in the bitext. After arguing that the task remains relevant even in an end-to-end paradigm, we present simple, dictionary- and word vector-based baselines and a BERT-based system. Our results make clear that even state of the art systems relying on the best end-to-end methods can be improved by bringing in old-fashioned methods such as stopword removal, lemmatization, and dictionaries

pdf bib abs

BMEAUT at SemEval-2020 Task 2: Lexical Entailment with Semantic Graphs
Ádám Kovács | Kinga Gémes | Andras Kornai | Gábor Recski
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper we present a novel rule-based, language independent method for determining lexical entailment relations using semantic representations built from Wiktionary definitions. Combined with a simple WordNet-based method our system achieves top scores on the English and Italian datasets of the Semeval-2020 task “Predicting Multilingual and Cross-lingual (graded) Lexical Entailment” (Glavaš et al., 2020). A detailed error analysis of our output uncovers future di- rections for improving both the semantic parsing method and the inference process on semantic graphs.

pdf bib abs

BME-TUW at SR’20: Lexical grammar induction for surface realization
Gábor Recski | Ádám Kovács | Kinga Gémes | Judit Ács | Andras Kornai
Proceedings of the Third Workshop on Multilingual Surface Realisation

We present a system for mapping Universal Dependency structures to raw text which learns to restore word order by training an Interpreted Regular Tree Grammar (IRTG) that establishes a mapping between string and graph operations. The reinflection step is handled by a standard sequence-to-sequence architecture with a biLSTM encoder and an LSTM decoder with attention. We modify our 2019 system (Kovács et al., 2019) with a new grammar induction mechanism that allows IRTG rules to operate on lemmata in addition to part-of-speech tags and ensures that each word and its dependents are reordered using the most specific set of learned patterns. We also introduce a hierarchical approach to word order restoration that independently determines the word order of each clause in a sentence before arranging them with respect to the main clause, thereby improving overall readability and also making the IRTG parsing task tractable. We participated in the 2020 Surface Realization Shared task, subtrack T1a (shallow, closed). Human evaluation shows we achieve significant improvements on two of the three out-of-domain datasets compared to the 2019 system we modified. Both components of our system are available on GitHub under an MIT license.

András Kornai

2021

2020

2019

2016

2015

2013

2012

2010

2008

2007

2006

2005

2004

2003

1985

Co-authors

Venues