Task-oriented dialog systems help a user achieve a particular goal by parsing user requests to execute a particular action. These systems typically require copious amounts of training data to effectively understand the user intent and its corresponding slots. Acquiring large training corpora requires significant manual effort in annotation, rendering its construction infeasible for low-resource languages. In this paper, we present a two-step approach for automatically constructing task-oriented dialogue data in such languages by making use of annotated data from high resource languages. First, we use a machine translation (MT) system to translate the utterance and slot information to the target language. Second, we use token prefix matching and mBERT based semantic matching to align the slot tokens to the corresponding tokens in the utterance. We hand-curate a new test dataset in two low-resource Dravidian languages and show the significance and impact of our training dataset construction using a state-of-the-art mBERT model - achieving a Slot F1 of 81.51 (Kannada) and 78.82 (Tamil) on our test sets.
Online platforms and communities establish their own norms that govern what behavior is acceptable within the community. Substantial effort in NLP has focused on identifying unacceptable behaviors and, recently, on forecasting them before they occur. However, these efforts have largely focused on toxicity as the sole form of community norm violation. Such focus has overlooked the much larger set of rules that moderators enforce. Here, we introduce a new dataset focusing on a more complete spectrum of community norms and their violations in the local conversational and global community contexts. We introduce a series of models that use this data to develop context- and community-sensitive norm violation detection, showing that these changes give high performance.
Automatic identification of cause-effect relationships from data is a challenging but important problem in artificial intelligence. Identifying semantic relationships has become increasingly important for multiple downstream applications like Question Answering, Information Retrieval and Event Prediction. In this work, we tackle the problem of causal relationship extraction from financial news using the FinCausal 2020 dataset. We tackle two tasks - 1) Detecting the presence of causal relationships and 2) Extracting segments corresponding to cause and effect from news snippets. We propose Transformer based sequence and token classification models with post-processing rules which achieve an F1 score of 96.12 and 79.60 on Tasks 1 and 2 respectively.
The rise in the usage of social media has placed it in a central position for news dissemination and consumption. This greatly increases the potential for proliferation of rumours and misinformation. In an effort to mitigate the spread of rumours, we tackle the related task of identifying the stance (Support, Deny, Query, Comment) of a social media post. Unlike previous works, we impose inductive biases that capture platform specific user behavior. These biases, coupled with social media fine-tuning of BERT allow for better language understanding, thus yielding an F1 score of 58.7 on the SemEval 2019 task on rumour stance detection.