Dan Moldovan

Also published as: D. Moldovan, Dan I. Moldovan

2020

pdf bib abs
CEREC: A Corpus for Entity Resolution in Email Conversations
Parag Pravin Dakle | Dan Moldovan
Proceedings of the 28th International Conference on Computational Linguistics

We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 38,996 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 54.1 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.

pdf bib abs
A Study on Entity Resolution for Email Conversations
Parag Pravin Dakle | Takshak Desai | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper investigates the problem of entity resolution for email conversations and presents a seed annotated corpus of email threads labeled with entity coreference chains. Characteristics of email threads concerning reference resolution are first discussed, and then the creation of the corpus and annotation steps are explained. Finally, performance of the current state-of-the-art deep learning models on the seed corpus is evaluated and qualitative error analysis on the predictions obtained is presented.

pdf bib abs
Joint Learning of Syntactic Features Helps Discourse Segmentation
Takshak Desai | Parag Pravin Dakle | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes an accurate framework for carrying out multi-lingual discourse segmentation with BERT (Devlin et al., 2019). The model is trained to identify segments by casting the problem as a token classification problem and jointly learning syntactic features like part-of-speech tags and dependency relations. This leads to significant improvements in performance. Experiments are performed in different languages, such as English, Dutch, German, Portuguese Brazilian and Basque to highlight the cross-lingual effectiveness of the segmenter. In particular, the model achieves a state-of-the-art F-score of 96.7 for the RST-DT corpus (Carlson et al., 2003) improving on the previous best model by 7.2%. Additionally, a qualitative explanation is provided for how proposed changes contribute to model performance by analyzing errors made on the test data.

pdf bib abs
Affect inTweets: A Transfer Learning Approach
Linrui Zhang | Hsin-Lun Huang | Yang Yu | Dan Moldovan
Proceedings of the Twelfth Language Resources and Evaluation Conference

People convey sentiments and emotions through language. To understand these affectual states is an essential step towards understanding natural language. In this paper, we propose a transfer-learning based approach to inferring the affectual state of a person from their tweets. As opposed to the traditional machine learning models which require considerable effort in designing task specific features, our model can be well adapted to the proposed tasks with a very limited amount of fine-tuning, which significantly reduces the manual effort in feature engineering. We aim to show that by leveraging the pre-learned knowledge, transfer learning models can achieve competitive results in the affectual content analysis of tweets, compared to the traditional models. As shown by the experiments on SemEval-2018 Task 1: Affect in Tweets, our model ranking 2nd, 4th and 6th place in four of its subtasks proves the effectiveness of our idea.

2018

pdf bib
Chinese Relation Classification using Long Short Term Memory Networks
Linrui Zhang | Dan Moldovan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Generating Questions for Reading Comprehension using Coherence Relations
Takshak Desai | Parag Dakle | Dan Moldovan
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this paper, we have proposed a technique for generating complex reading comprehension questions from a discourse that are more useful than factual ones derived from assertions. Our system produces a set of general-level questions using coherence relations and a set of well-defined syntactic transformations on the input text. Generated questions evaluate comprehension abilities like a comprehensive analysis of the text and its structure, correct identification of the author’s intent, a thorough evaluation of stated arguments; and a deduction of the high-level semantic relations that hold between text spans. Experiments performed on the RST-DT corpus allow us to conclude that our system possesses a strong aptitude for generating intricate questions. These questions are capable of effectively assessing a student’s interpretation of the text.

pdf bib abs
Rule-based vs. Neural Net Approaches to Semantic Textual Similarity
Linrui Zhang | Dan Moldovan
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

This paper presents a neural net approach to determine Semantic Textual Similarity (STS) using attention-based bidirectional Long Short-Term Memory Networks (Bi-LSTM). To this date, most of the traditional STS systems were rule-based that built on top of excessive use of linguistic features and resources. In this paper, we present an end-to-end attention-based Bi-LSTM neural network system that solely takes word-level features, without expensive feature engineering work or the usage of external resources. By comparing its performance with traditional rule-based systems against SemEval-2012 benchmark, we make an assessment on the limitations and strengths of neural net systems to rule-based systems on Semantic Textual Similarity.

2014

pdf bib
Leveraging Verb-Argument Structures to Infer Semantic Relations
Eduardo Blanco | Dan Moldovan
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs
Multilingual eXtended WordNet Knowledge Base: Semantic Parsing and Translation of Glosses
Tatiana Erekhinskaya | Meghana Satpute | Dan Moldovan
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a method to create WordNet-like lexical resources for different languages. Instead of directly translating glosses from one language to another, we perform first semantic parsing of WordNet glosses and then translate the resulting semantic representation. The proposed approach simplifies the machine translation of the glosses. The approach provides ready to use semantic representation of glosses in target languages instead of just plain text.

2013

pdf bib
A Semantically Enhanced Approach to Determine Textual Similarity
Eduardo Blanco | Dan Moldovan
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib abs
A Tool for Extracting Conversational Implicatures
Marta Tatu | Dan Moldovan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Explicitly conveyed knowledge represents only a portion of the information communicated by a text snippet. Automated mechanisms for deriving explicit information exist; however, the implicit assumptions and default inferences that capture our intuitions about a normal interpretation of a communication remain hidden for automated systems, despite the communication participants' ease of grasping the complete meaning of the communication. In this paper, we describe a reasoning framework for the automatic identification of conversational implicatures conveyed by real-world English and Arabic conversations carried via twitter.com. Our system transforms given utterances into deep semantic logical forms. It produces a variety of axioms that identify lexical connections between concepts, define rules of combining semantic relations, capture common-sense world knowledge, and encode Grice's Conversational Maxims. By exploiting this rich body of knowledge and reasoning within the context of the conversation, our system produces entailments and implicatures conveyed by analyzed utterances with an F-measure of 70.42% for English conversations.

pdf bib abs
Polaris: Lymba’s Semantic Parser
Dan Moldovan | Eduardo Blanco
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Semantic representation of text is key to text understanding and reasoning. In this paper, we present Polaris, Lymba's semantic parser. Polaris is a supervised semantic parser that given text extracts semantic relations. It extracts relations from a wide variety of lexico-syntactic patterns, including verb-argument structures, noun compounds and others. The output can be provided in several formats: XML, RDF triples, logic forms or plain text, facilitating interoperability with other tools. Polaris is implemented using eight separate modules. Each module is explained and a detailed example of processing using a sample sentence is provided. Overall results using a benchmark are discussed. Per module performance, including errors made and pruned by each module are also analyzed.

pdf bib
Fine-Grained Focus for Pinpointing Positive Implicit Meaning from Negated Statements
Eduardo Blanco | Dan Moldovan
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Semantic Representation of Negation Using Focus Detection
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Learning of Semantic Relation Composition
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Model for Composing Semantic Relations
Eduardo Blanco | Dan Moldovan
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf bib
Composition of Semantic Relations: Model and Applications
Eduardo Blanco | Hakki C. Cankaya | Dan Moldovan
Coling 2010: Posters

pdf bib
Automatic Discovery of Manner Relations and its Applications
Eduardo Blanco | Dan Moldovan
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Inducing Ontologies from Folksonomies using Natural Language Understanding
Marta Tatu | Dan Moldovan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Folksonomies are unsystematic, unsophisticated collections of keywords associated by social bookmarking users to web content and, despite their inconsistency problems (typographical errors, spelling variations, use of space or punctuation as delimiters, same tag applied in different context, synonymy of concepts, etc.), their popularity is increasing among Web 2.0 application developers. In this paper, in addition to eliminating folksonomic irregularities existing at the lexical, syntactic or semantic understanding levels, we propose an algorithm that automatically builds a semantic representation of the folksonomy by exploiting the tags, their social bookmarking associations (co-occuring tags) and, more importantly, the content of labeled documents. We derive the semantics of each tag, discover semantic links between the folksonomic tags and expose the underlying semantic structure of the folksonomy, thus, enabling a number of information discovery and ontology-based reasoning applications.

pdf bib abs
Feasibility of Automatically Bootstrapping a Persian WordNet
Chris Irwin Davis | Dan Moldovan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we describe a proof-of-concept for the bootstrapping of a Persian WordNet. This effort was motivated by previous work done at Stanford University on bootstrapping an Arabic WordNet using a parallel corpus and an English WordNet. The principle of that work is based on the premise that paradigmatic relations are by nature deeply semantic, and as such, are likely to remain intact between languages. We performed our task on a Persian-English bilingual corpus of George Orwells Nineteen Eighty-Four. The corpus was neither aligned nor sense tagged, so it was necessary that these were undertaken first. A combination of manual and semiautomated methods were used to tag and sentence align the corpus. Actual mapping of English word senses onto Persian was done using automated techniques. Although Persian is written in Arabic script, it is an Indo-European language, while Arabic is a Central Semitic language. Despite their linguistic differences, we endeavor to test the applicability of the Stanford strategy to our task.

pdf bib abs
Semi-Automatic Domain Ontology Creation from Text Resources
Mithun Balakrishna | Dan Moldovan | Marta Tatu | Marian Olteanu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Analysts in various domains, especially intelligence and financial, have to constantly extract useful knowledge from large amounts of unstructured or semi-structured data. Keyword-based search, faceted search, question-answering, etc. are some of the automated methodologies that have been used to help analysts in their tasks. General-purpose and domain-specific ontologies have been proposed to help these automated methods in organizing data and providing access to useful information. However, problems in ontology creation and maintenance have resulted in expensive procedures for expanding/maintaining the ontology library available to support the growing and evolving needs of analysts. In this paper, we present a generalized and improved procedure to automatically extract deep semantic information from text resources and rapidly create semantically-rich domain ontologies while keeping the manual intervention to a minimum. We also present evaluation results for the intelligence and financial ontology libraries, semi-automatically created by our proposed methodologies using freely-available textual resources from the Web.

2008

pdf bib abs
Causal Relation Extraction
Eduardo Blanco | Nuria Castell | Dan Moldovan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a supervised method for the detection and extraction of Causal Relations from open domain text. First we give a brief outline of the definition of causation and how it relates to other Semantic Relations, as well as a characterization of their encoding. In this work, we only consider marked and explicit causations. Our approach first identifies the syntactic patterns that may encode a causation, then we use Machine Learning techniques to decide whether or not a pattern instance encodes a causation. We focus on the most productive pattern, a verb phrase followed by a relator and a clause, and its reverse version, a relator followed by a clause and a verb phrase. As relators we consider the words as, after, because and since. We present a set of lexical, syntactic and semantic features for the classification task, their rationale and some examples. The results obtained are discussed and the errors analyzed.

1993

1992

pdf bib
Semantic Network Array Processor as a Massively Parallel Computing Platform for High Performance and Large-Scale Natural Language Processing
Hiroaki Kitano | Dan Moldovan
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

1991

pdf bib abs
Toward High Performance Machine Translation: Preliminary Results from Massively Parallel Memory-Based Translation on SNAP
Hiroaki Kitano | Dan Moldovan | Seungho Cha
Proceedings of Machine Translation Summit III: Papers

This paper describes a memory-based machine translation system developed for the Semantic Net- work Array Processor (SNAP). The goal of our work is to develop a scalable and high-performance memory-based machine translation system which utilizes the high degree of parallelism provided by the SNAP machine. We have implemented an experimental machine translation system DMSNAP as a central part of a real-time speech-to-speech dia- logue translation system. It is a SNAP version of the ΦDMDIALOG speech-to-speech translation system. Memory-based natural language processing and syntactic constraint network model has been incorporated using parallel marker-passing which is directly supported from hardware level. Experimental results demonstrate that the parsing of a sentence is done in the order of milliseconds.