Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area. We will release the dataset to promote future research on multilingual ED.
Event trigger detection, entity mention recognition, event argument extraction, and relation extraction are the four important tasks in information extraction that have been performed jointly (Joint Information Extraction - JointIE) to avoid error propagation and leverage dependencies between the task instances (i.e., event triggers, entity mentions, relations, and event arguments). However, previous JointIE models often assume heuristic manually-designed dependency between the task instances and mean-field factorization for the joint distribution of instance labels, thus unable to capture optimal dependencies among instances and labels to improve representation learning and IE performance. To overcome these limitations, we propose to induce a dependency graph among task instances from data to boost representation learning. To better capture dependencies between instance labels, we propose to directly estimate their joint distribution via Conditional Random Fields. Noise Contrastive Estimation is introduced to address the maximization of the intractable joint likelihood for model training. Finally, to improve the decoding with greedy or beam search in prior work, we present Simulated Annealing to better find the globally optimal assignment for instance labels at decoding time. Experimental results show that our proposed model outperforms previous models on multiple IE tasks across 5 datasets and 2 languages.
In this work, we focus on Cross-Lingual Event Detection where a model is trained on data from a source language but its performance is evaluated on data from a second, target, language. Most recent works in this area have harnessed the language-invariant qualities displayed by pre-trained Multi-lingual Language Models. Their performance, however, reveals there is room for improvement as the cross-lingual setting entails particular challenges. We employ Adversarial Language Adaptation to train a Language Discriminator to discern between the source and target languages using unlabeled data. The discriminator is trained in an adversarial manner so that the encoder learns to produce refined, language-invariant representations that lead to improved performance. More importantly, we optimize the adversarial training process by only presenting the discriminator with the most informative samples. We base our intuition about what makes a sample informative on two disparate metrics: sample similarity and event presence. Thus, we propose leveraging Optimal Transport as a solution to naturally combine these two distinct information sources into the selection process. Extensive experiments on 8 different language pairs, using 4 languages from unrelated families, show the flexibility and effectiveness of our model that achieves state-of-the-art results.
This paper presents FAMIE, a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. This hinders the engagement, productivity, and efficiency of annotators. Based on the idea of using a small proxy network for fast data selection, we introduce a novel knowledge distillation mechanism to synchronize the proxy network with the main large model (i.e., BERT-based) to ensure the appropriateness of the selected annotation examples for the main model. Our AL framework can support multiple languages. The experiments demonstrate the advantages of FAMIE in terms of competitive performance and time efficiency for sequence labeling with AL. We publicly release our code (https://github.com/nlp-uoregon/famie) and demo website (http://nlp.uoregon.edu:9000/). A demo video for FAMIE is provided at: https://youtu.be/I2i8n_jAyrY
Machine translation (MT) is an important task in natural language processing, which aims to translate a sentence in a source language to another sentence with the same/similar semantics in a target language. Despite the huge effort on building MT systems for different language pairs, most previous work focuses on formal-language settings, where text to be translated come from written sources such as books and news articles. As a result, such MT systems could fail to translate livestreaming video transcripts, where text is often shorter and might be grammatically incorrect. To overcome this issue, we introduce a novel MT corpus - BehanceMT for livestreaming video transcript translation. Our corpus contains parallel transcripts for 3 language pairs, where English is the source language and Spanish, Chinese, and Arabic are the target languages. Experimental results show that finetuning a pretrained MT model on BehanceMT significantly improves the performance of the model in translating video transcripts across 3 language pairs. In addition, the finetuned MT model outperforms GoogleTranslate in 2 out of 3 language pairs, further demonstrating the usefulness of our proposed dataset for video transcript translation. BehanceMT will be publicly released upon the acceptance of the paper.
Event Argument Extraction (EAE) is one of the sub-tasks of event extraction, aiming to recognize the role of each entity mention toward a specific event trigger. Despite the success of prior works in sentence-level EAE, the document-level setting is less explored. In particular, whereas syntactic structures of sentences have been shown to be effective for sentence-level EAE, prior document-level EAE models totally ignore syntactic structures for documents. Hence, in this work, we study the importance of syntactic structures in document-level EAE. Specifically, we propose to employ Optimal Transport (OT) to induce structures of documents based on sentence-level syntactic structures and tailored to EAE task. Furthermore, we propose a novel regularization technique to explicitly constrain the contributions of unrelated context words in the final prediction for EAE. We perform extensive experiments on the benchmark document-level EAE dataset RAMS that leads to the state-of-the-art performance. Moreover, our experiments on the ACE 2005 dataset reveals the effectiveness of the proposed model in the sentence-level EAE by establishing new state-of-the-art results.
Extracting entities, events, event arguments, and relations (i.e., task instances) from text represents four main challenging tasks in information extraction (IE), which have been solved jointly (JointIE) to boost the overall performance for IE. As such, previous work often leverages two types of dependencies between the tasks, i.e., cross-instance and cross-type dependencies representing relatedness between task instances and correlations between information types of the tasks. However, the cross-task dependencies in prior work are not optimal as they are only designed manually according to some task heuristics. To address this issue, we propose a novel model for JointIE that aims to learn cross-task dependencies from data. In particular, we treat each task instance as a node in a dependency graph where edges between the instances are inferred through information from different layers of a pretrained language model (e.g., BERT). Furthermore, we utilize the Chow-Liu algorithm to learn a dependency tree between information types for JointIE by seeking to approximate the joint distribution of the types from data. Finally, the Chow-Liu dependency tree is used to generate cross-type patterns, serving as anchor knowledge to guide the learning of representations and dependencies between instances for JointIE. Experimental results show that our proposed model significantly outperforms strong JointIE baselines over four datasets with different languages.
Event Causality Identification (ECI) is the task of detecting causal relations between events mentioned in the text. Although this task has been extensively studied for English materials, it is under-explored for many other languages. A major reason for this issue is the lack of multilingual datasets that provide consistent annotations for event causality relations in multiple non-English languages. To address this issue, we introduce a new multilingual dataset for ECI, called MECI. The dataset employs consistent annotation guidelines for five typologically different languages, i.e., English, Danish, Spanish, Turkish, and Urdu. Our dataset thus enable a new research direction on cross-lingual transfer learning for ECI. Our extensive experiments demonstrate high quality for MECI that can provide ample research challenges and directions for future research. We will publicly release MECI to promote research on multilingual ECI.
Fine-grained temporal relation extraction (FineTempRel) aims to recognize the durations and timeline of event mentions in text. A missing part in the current deep learning models for FineTempRel is their failure to exploit the syntactic structures of the input sentences to enrich the representation vectors. In this work, we propose to fill this gap by introducing novel methods to integrate the syntactic structures into the deep learning models for FineTempRel. The proposed model focuses on two types of syntactic information from the dependency trees, i.e., the syntax-based importance scores for representation learning of the words and the syntactic connections to identify important context words for the event mentions. We also present two novel techniques to facilitate the knowledge transfer between the subtasks of FineTempRel, leading to a novel model with the state-of-the-art performance for this task.
The task of Event Detection (ED) in Information Extraction aims to recognize and classify trigger words of events in text. The recent progress has featured advanced transformer-based language models (e.g., BERT) as a critical component in state-of-the-art models for ED. However, the length limit for input texts is a barrier for such ED models as they cannot encode long-range document-level context that has been shown to be beneficial for ED. To address this issue, we propose a novel method to model document-level context for ED that dynamically selects relevant sentences in the document for the event prediction of the target sentence. The target sentence will be then augmented with the selected sentences and consumed entirely by transformer-based language models for improved representation learning for ED. To this end, the REINFORCE algorithm is employed to train the relevant sentence selection for ED. Several information types are then introduced to form the reward function for the training process, including ED performance, sentence similarity, and discourse relations. Our extensive experiments on multiple benchmark datasets reveal the effectiveness of the proposed model, leading to new state-of-the-art performance.
Previous work on crosslingual Relation and Event Extraction (REE) suffers from the monolingual bias issue due to the training of models on only the source language data. An approach to overcome this issue is to use unlabeled data in the target language to aid the alignment of crosslingual representations, i.e., via fooling a language discriminator. However, as this approach does not condition on class information, a target language example of a class could be incorrectly aligned to a source language example of a different class. To address this issue, we propose a novel crosslingual alignment method that leverages class information of REE tasks for representation learning. In particular, we propose to learn two versions of representation vectors for each class in an REE task based on either source or target language examples. Representation vectors for corresponding classes will then be aligned to achieve class-aware alignment for crosslingual representations. In addition, we propose to further align representation vectors for language-universal word categories (i.e., parts of speech and dependency relations). As such, a novel filtering mechanism is presented to facilitate the learning of word category representations from contextualized representations on input texts based on adversarial learning. We conduct extensive crosslingual experiments with English, Chinese, and Arabic over REE tasks. The results demonstrate the benefits of the proposed method that significantly advances the state-of-the-art performance in these settings.
We study the problem of Cross-lingual Event Argument Extraction (CEAE). The task aims to predict argument roles of entity mentions for events in text, whose language is different from the language that a predictive model has been trained on. Previous work on CEAE has shown the cross-lingual benefits of universal dependency trees in capturing shared syntactic structures of sentences across languages. In particular, this work exploits the existence of the syntactic connections between the words in the dependency trees as the anchor knowledge to transfer the representation learning across languages for CEAE models (i.e., via graph convolutional neural networks – GCNs). In this paper, we introduce two novel sources of language-independent information for CEAE models based on the semantic similarity and the universal dependency relations of the word pairs in different languages. We propose to use the two sources of information to produce shared sentence structures to bridge the gap between languages and improve the cross-lingual performance of the CEAE models. Extensive experiments are conducted with Arabic, Chinese, and English to demonstrate the effectiveness of the proposed method for CEAE.
We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is also available at: http://nlp.uoregon.edu/trankit. Finally, we create a demo video for Trankit at: https://youtu.be/q0KGP3zGjGc.
We study a new problem of cross-lingual transfer learning for event coreference resolution (ECR) where models trained on data from a source language are adapted for evaluations in different target languages. We introduce the first baseline model for this task based on XLM-RoBERTa, a state-of-the-art multilingual pre-trained language model. We also explore language adversarial neural networks (LANN) that present language discriminators to distinguish texts from the source and target languages to improve the language generalization for ECR. In addition, we introduce two novel mechanisms to further enhance the general representation learning of LANN, featuring: (i) multi-view alignment to penalize cross coreference-label alignment of examples in the source and target languages, and (ii) optimal transport to select close examples in the source and target languages to provide better training signals for the language discriminators. Finally, we perform extensive experiments for cross-lingual ECR from English to Spanish and Chinese to demonstrate the effectiveness of the proposed methods.
Existing works on information extraction (IE) have mainly solved the four main tasks separately (entity mention recognition, relation extraction, event trigger detection, and argument extraction), thus failing to benefit from inter-dependencies between tasks. This paper presents a novel deep learning model to simultaneously solve the four tasks of IE in a single model (called FourIE). Compared to few prior work on jointly performing four IE tasks, FourIE features two novel contributions to capture inter-dependencies between tasks. First, at the representation level, we introduce an interaction graph between instances of the four tasks that is used to enrich the prediction representation for one instance with those from related instances of other tasks. Second, at the label level, we propose a dependency graph for the information types in the four IE tasks that captures the connections between the types expressed in an input sentence. A new regularization mechanism is introduced to enforce the consistency between the golden and predicted type dependency graphs to improve representation learning. We show that the proposed model achieves the state-of-the-art performance for joint IE on both monolingual and multilingual learning settings with three different languages.