Natalia Skachkova


2024

pdf bib
Multilingual coreference resolution as text generation
Natalia Skachkova
Proceedings of The Seventh Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
Cennet Oguz | Pascal Denis | Simon Ostermann | Emmanuel Vincent | Natalia Skachkova | Josef Van Genabith
Findings of the Association for Computational Linguistics: EMNLP 2024

Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross- and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ~10 % for unseen languages.

2023

pdf bib
Multilingual coreference resolution: Adapt and Generate
Natalia Skachkova | Tatiana Anikina | Anna Mokhova
Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution

The paper presents two multilingual coreference resolution systems submitted for the CRAC Shared Task 2023. The DFKI-Adapt system achieves 61.86 F1 score on the shared task test data, outperforming the official baseline by 4.9 F1 points. This system uses a combination of different features and training settings, including character embeddings, adapter modules, joint pre-training and loss-based re-training. We provide evaluation for each of the settings on 12 different datasets and compare the results. The other submission DFKI-MPrompt uses a novel approach that involves prompting for mention generation. Although the scores achieved by this model are lower compared to the baseline, the method shows a new way of approaching the coreference task and provides good results with just five epochs of training.

2022

pdf bib
Anaphora Resolution in Dialogue: System Description (CODI-CRAC 2022 Shared Task)
Tatiana Anikina | Natalia Skachkova | Joseph Renner | Priyansh Trivedi
Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

We describe three models submitted for the CODI-CRAC 2022 shared task. To perform identity anaphora resolution, we test several combinations of the incremental clustering approach based on the Workspace Coreference System (WCS) with other coreference models. The best result is achieved by adding the “cluster merging” version of the coref-hoi model, which brings up to 10.33% improvement1 over vanilla WCS clustering. Discourse deixis resolution is implemented as multi-task learning: we combine the learning objective of coref-hoi with anaphor type classification. We adapt the higher-order resolution model introduced in Joshi et al. (2019) for bridging resolution given gold mentions and anaphors.

2021

pdf bib
Anaphora Resolution in Dialogue: Description of the DFKI-TalkingRobots System for the CODI-CRAC 2021 Shared-Task
Tatiana Anikina | Cennet Oguz | Natalia Skachkova | Siyu Tao | Sharmila Upadhyaya | Ivana Kruijff-Korbayova
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

We describe the system developed by the DFKI-TalkingRobots Team for the CODI-CRAC 2021 Shared-Task on anaphora resolution in dialogue. Our system consists of three subsystems: (1) the Workspace Coreference System (WCS) incrementally clusters mentions using semantic similarity based on embeddings combined with lexical feature heuristics; (2) the Mention-to-Mention (M2M) coreference resolution system pairs same entity mentions; (3) the Discourse Deixis Resolution (DDR) system employs a Siamese Network to detect discourse anaphor-antecedent pairs. WCS achieved F1-score of 55.6% averaged across the evaluation test sets, M2M achieved 57.2% and DDR achieved 21.5%.

pdf bib
Anaphora Resolution in Dialogue: Cross-Team Analysis of the DFKI-TalkingRobots Team Submissions for the CODI-CRAC 2021 Shared-Task
Natalia Skachkova | Cennet Oguz | Tatiana Anikina | Siyu Tao | Sharmila Upadhyaya | Ivana Kruijff-Korbayova
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

We compare our team’s systems to others submitted for the CODI-CRAC 2021 Shared-Task on anaphora resolution in dialogue. We analyse the architectures and performance, report some problematic cases in gold annotations, and suggest possible improvements of the systems, their evaluation, data annotation, and the organization of the shared task.

pdf bib
Automatic Assignment of Semantic Frames in Disaster Response Team Communication Dialogues
Natalia Skachkova | Ivana Kruijff-Korbayova
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

We investigate frame semantics as a meaning representation framework for team communication in a disaster response scenario. We focus on the automatic frame assignment and retrain PAFIBERT, which is one of the state-of-the-art frame classifiers, on English and German disaster response team communication data, obtaining accuracy around 90%. We examine the performance of both models and discuss their adjustments, such as sampling of additional training instances from an unrelated domain and adding extra lexical and discourse features to input token representations. We show that sampling has some positive effect on the German frame classifier, discuss an unexpected impact of extra features on the models’ behaviour and perform a careful error analysis.

2020

pdf bib
Reference in Team Communication for Robot-Assisted Disaster Response: An Initial Analysis
Natalia Skachkova | Ivana Kruijff-Korbayova
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

We analyze reference phenomena in a corpus of robot-assisted disaster response team communication. The annotation scheme we designed for this purpose distinguishes different types of entities, roles, reference units and relations. We focus particularly on mission-relevant objects, locations and actors and also annotate a rich set of reference links, including co-reference and various other kinds of relations. We explain the categories used in our annotation, present their distribution in the corpus and discuss challenging cases.

2018

pdf bib
Closing Brackets with Recurrent Neural Networks
Natalia Skachkova | Thomas Trost | Dietrich Klakow
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Many natural and formal languages contain words or symbols that require a matching counterpart for making an expression well-formed. The combination of opening and closing brackets is a typical example of such a construction. Due to their commonness, the ability to follow such rules is important for language modeling. Currently, recurrent neural networks (RNNs) are extensively used for this task. We investigate whether they are capable of learning the rules of opening and closing brackets by applying them to synthetic Dyck languages that consist of different types of brackets. We provide an analysis of the statistical properties of these languages as a baseline and show strengths and limits of Elman-RNNs, GRUs and LSTMs in experiments on random samples of these languages. In terms of perplexity and prediction accuracy, the RNNs get close to the theoretical baseline in most cases.

2017

pdf bib
Multilingual Ontologies for the Representation and Processing of Folktales
Thierry Declerck | Anastasija Aman | Martin Banzer | Dominik Macháček | Lisa Schäfer | Natalia Skachkova
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

We describe work done in the field of folkloristics and consisting in creating ontologies based on well-established studies proposed by “classical” folklorists. This work is supporting the availability of a huge amount of digital and structured knowledge on folktales to digital humanists. The ontological encoding of past and current motif-indexation and classification systems for folktales was in the first step limited to English language data. This led us to focus on making those newly generated formal knowledge sources available in a few more languages, like German, Russian and Bulgarian. We stress the importance of achieving this multilingual extension of our ontologies at a larger scale, in order for example to support the automated analysis and classification of such narratives in a large variety of languages, as those are getting more and more accessible on the Web.