Dan Moldovan

Also published as: D. Moldovan, Dan I. Moldovan

2020

CEREC: A Corpus for Entity Resolution in Email Conversations
Parag Pravin Dakle | Dan Moldovan
Proceedings of the 28th International Conference on Computational Linguistics

We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 38,996 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 54.1 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.

pdf bib abs

A Study on Entity Resolution for Email Conversations
Parag Pravin Dakle | Takshak Desai | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper investigates the problem of entity resolution for email conversations and presents a seed annotated corpus of email threads labeled with entity coreference chains. Characteristics of email threads concerning reference resolution are first discussed, and then the creation of the corpus and annotation steps are explained. Finally, performance of the current state-of-the-art deep learning models on the seed corpus is evaluated and qualitative error analysis on the predictions obtained is presented.

pdf bib abs

Joint Learning of Syntactic Features Helps Discourse Segmentation
Takshak Desai | Parag Pravin Dakle | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes an accurate framework for carrying out multi-lingual discourse segmentation with BERT (Devlin et al., 2019). The model is trained to identify segments by casting the problem as a token classification problem and jointly learning syntactic features like part-of-speech tags and dependency relations. This leads to significant improvements in performance. Experiments are performed in different languages, such as English, Dutch, German, Portuguese Brazilian and Basque to highlight the cross-lingual effectiveness of the segmenter. In particular, the model achieves a state-of-the-art F-score of 96.7 for the RST-DT corpus (Carlson et al., 2003) improving on the previous best model by 7.2%. Additionally, a qualitative explanation is provided for how proposed changes contribute to model performance by analyzing errors made on the test data.

pdf bib abs

Affect inTweets: A Transfer Learning Approach
Linrui Zhang | Hsin-Lun Huang | Yang Yu | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

People convey sentiments and emotions through language. To understand these affectual states is an essential step towards understanding natural language. In this paper, we propose a transfer-learning based approach to inferring the affectual state of a person from their tweets. As opposed to the traditional machine learning models which require considerable effort in designing task specific features, our model can be well adapted to the proposed tasks with a very limited amount of fine-tuning, which significantly reduces the manual effort in feature engineering. We aim to show that by leveraging the pre-learned knowledge, transfer learning models can achieve competitive results in the affectual content analysis of tweets, compared to the traditional models. As shown by the experiments on SemEval-2018 Task 1: Affect in Tweets, our model ranking 2nd, 4th and 6th place in four of its subtasks proves the effectiveness of our idea.

Dan Moldovan

2020

2018

2014

2013

2012

2011

2010

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1993

1992

1991

Co-authors

Venues