Dan Moldovan

Also published as: D. Moldovan, Dan I. Moldovan


2020

pdf bib
CEREC: A Corpus for Entity Resolution in Email Conversations
Parag Pravin Dakle | Dan Moldovan
Proceedings of the 28th International Conference on Computational Linguistics

We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 38,996 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 54.1 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.

pdf bib
A Study on Entity Resolution for Email Conversations
Parag Pravin Dakle | Takshak Desai | Dan Moldovan
Proceedings of the 12th Language Resources and Evaluation Conference

This paper investigates the problem of entity resolution for email conversations and presents a seed annotated corpus of email threads labeled with entity coreference chains. Characteristics of email threads concerning reference resolution are first discussed, and then the creation of the corpus and annotation steps are explained. Finally, performance of the current state-of-the-art deep learning models on the seed corpus is evaluated and qualitative error analysis on the predictions obtained is presented.

pdf bib
Joint Learning of Syntactic Features Helps Discourse Segmentation
Takshak Desai | Parag Pravin Dakle | Dan Moldovan
Proceedings of the 12th Language Resources and Evaluation Conference

This paper describes an accurate framework for carrying out multi-lingual discourse segmentation with BERT (Devlin et al., 2019). The model is trained to identify segments by casting the problem as a token classification problem and jointly learning syntactic features like part-of-speech tags and dependency relations. This leads to significant improvements in performance. Experiments are performed in different languages, such as English, Dutch, German, Portuguese Brazilian and Basque to highlight the cross-lingual effectiveness of the segmenter. In particular, the model achieves a state-of-the-art F-score of 96.7 for the RST-DT corpus (Carlson et al., 2003) improving on the previous best model by 7.2%. Additionally, a qualitative explanation is provided for how proposed changes contribute to model performance by analyzing errors made on the test data.

pdf bib
Affect inTweets: A Transfer Learning Approach
Linrui Zhang | Hsin-Lun Huang | Yang Yu | Dan Moldovan
Proceedings of the 12th Language Resources and Evaluation Conference

People convey sentiments and emotions through language. To understand these affectual states is an essential step towards understanding natural language. In this paper, we propose a transfer-learning based approach to inferring the affectual state of a person from their tweets. As opposed to the traditional machine learning models which require considerable effort in designing task specific features, our model can be well adapted to the proposed tasks with a very limited amount of fine-tuning, which significantly reduces the manual effort in feature engineering. We aim to show that by leveraging the pre-learned knowledge, transfer learning models can achieve competitive results in the affectual content analysis of tweets, compared to the traditional models. As shown by the experiments on SemEval-2018 Task 1: Affect in Tweets, our model ranking 2nd, 4th and 6th place in four of its subtasks proves the effectiveness of our idea.

2018

pdf bib
Chinese Relation Classification using Long Short Term Memory Networks
Linrui Zhang | Dan Moldovan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Generating Questions for Reading Comprehension using Coherence Relations
Takshak Desai | Parag Dakle | Dan Moldovan
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this paper, we have proposed a technique for generating complex reading comprehension questions from a discourse that are more useful than factual ones derived from assertions. Our system produces a set of general-level questions using coherence relations and a set of well-defined syntactic transformations on the input text. Generated questions evaluate comprehension abilities like a comprehensive analysis of the text and its structure, correct identification of the author’s intent, a thorough evaluation of stated arguments; and a deduction of the high-level semantic relations that hold between text spans. Experiments performed on the RST-DT corpus allow us to conclude that our system possesses a strong aptitude for generating intricate questions. These questions are capable of effectively assessing a student’s interpretation of the text.

pdf bib
Rule-based vs. Neural Net Approaches to Semantic Textual Similarity
Linrui Zhang | Dan Moldovan
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

This paper presents a neural net approach to determine Semantic Textual Similarity (STS) using attention-based bidirectional Long Short-Term Memory Networks (Bi-LSTM). To this date, most of the traditional STS systems were rule-based that built on top of excessive use of linguistic features and resources. In this paper, we present an end-to-end attention-based Bi-LSTM neural network system that solely takes word-level features, without expensive feature engineering work or the usage of external resources. By comparing its performance with traditional rule-based systems against SemEval-2012 benchmark, we make an assessment on the limitations and strengths of neural net systems to rule-based systems on Semantic Textual Similarity.

2014

pdf bib
Leveraging Verb-Argument Structures to Infer Semantic Relations
Eduardo Blanco | Dan Moldovan
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Multilingual eXtended WordNet Knowledge Base: Semantic Parsing and Translation of Glosses
Tatiana Erekhinskaya | Meghana Satpute | Dan Moldovan
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a method to create WordNet-like lexical resources for different languages. Instead of directly translating glosses from one language to another, we perform first semantic parsing of WordNet glosses and then translate the resulting semantic representation. The proposed approach simplifies the machine translation of the glosses. The approach provides ready to use semantic representation of glosses in target languages instead of just plain text.

2013

pdf bib
A Semantically Enhanced Approach to Determine Textual Similarity
Eduardo Blanco | Dan Moldovan
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
A Tool for Extracting Conversational Implicatures
Marta Tatu | Dan Moldovan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Explicitly conveyed knowledge represents only a portion of the information communicated by a text snippet. Automated mechanisms for deriving explicit information exist; however, the implicit assumptions and default inferences that capture our intuitions about a normal interpretation of a communication remain hidden for automated systems, despite the communication participants' ease of grasping the complete meaning of the communication. In this paper, we describe a reasoning framework for the automatic identification of conversational implicatures conveyed by real-world English and Arabic conversations carried via twitter.com. Our system transforms given utterances into deep semantic logical forms. It produces a variety of axioms that identify lexical connections between concepts, define rules of combining semantic relations, capture common-sense world knowledge, and encode Grice's Conversational Maxims. By exploiting this rich body of knowledge and reasoning within the context of the conversation, our system produces entailments and implicatures conveyed by analyzed utterances with an F-measure of 70.42% for English conversations.

pdf bib
Polaris: Lymba’s Semantic Parser
Dan Moldovan | Eduardo Blanco
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Semantic representation of text is key to text understanding and reasoning. In this paper, we present Polaris, Lymba's semantic parser. Polaris is a supervised semantic parser that given text extracts semantic relations. It extracts relations from a wide variety of lexico-syntactic patterns, including verb-argument structures, noun compounds and others. The output can be provided in several formats: XML, RDF triples, logic forms or plain text, facilitating interoperability with other tools. Polaris is implemented using eight separate modules. Each module is explained and a detailed example of processing using a sample sentence is provided. Overall results using a benchmark are discussed. Per module performance, including errors made and pruned by each module are also analyzed.

pdf bib
Fine-Grained Focus for Pinpointing Positive Implicit Meaning from Negated Statements
Eduardo Blanco | Dan Moldovan
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
A Model for Composing Semantic Relations
Eduardo Blanco | Dan Moldovan
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Semantic Representation of Negation Using Focus Detection
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Learning of Semantic Relation Composition
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Inducing Ontologies from Folksonomies using Natural Language Understanding
Marta Tatu | Dan Moldovan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Folksonomies are unsystematic, unsophisticated collections of keywords associated by social bookmarking users to web content and, despite their inconsistency problems (typographical errors, spelling variations, use of space or punctuation as delimiters, same tag applied in different context, synonymy of concepts, etc.), their popularity is increasing among Web 2.0 application developers. In this paper, in addition to eliminating folksonomic irregularities existing at the lexical, syntactic or semantic understanding levels, we propose an algorithm that automatically builds a semantic representation of the folksonomy by exploiting the tags, their social bookmarking associations (co-occuring tags) and, more importantly, the content of labeled documents. We derive the semantics of each tag, discover semantic links between the folksonomic tags and expose the underlying semantic structure of the folksonomy, thus, enabling a number of information discovery and ontology-based reasoning applications.

pdf bib
Feasibility of Automatically Bootstrapping a Persian WordNet
Chris Irwin Davis | Dan Moldovan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we describe a proof-of-concept for the bootstrapping of a Persian WordNet. This effort was motivated by previous work done at Stanford University on bootstrapping an Arabic WordNet using a parallel corpus and an English WordNet. The principle of that work is based on the premise that paradigmatic relations are by nature deeply semantic, and as such, are likely to remain intact between languages. We performed our task on a Persian-English bilingual corpus of George Orwell’s Nineteen Eighty-Four. The corpus was neither aligned nor sense tagged, so it was necessary that these were undertaken first. A combination of manual and semiautomated methods were used to tag and sentence align the corpus. Actual mapping of English word senses onto Persian was done using automated techniques. Although Persian is written in Arabic script, it is an Indo-European language, while Arabic is a Central Semitic language. Despite their linguistic differences, we endeavor to test the applicability of the Stanford strategy to our task.

pdf bib
Semi-Automatic Domain Ontology Creation from Text Resources
Mithun Balakrishna | Dan Moldovan | Marta Tatu | Marian Olteanu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Analysts in various domains, especially intelligence and financial, have to constantly extract useful knowledge from large amounts of unstructured or semi-structured data. Keyword-based search, faceted search, question-answering, etc. are some of the automated methodologies that have been used to help analysts in their tasks. General-purpose and domain-specific ontologies have been proposed to help these automated methods in organizing data and providing access to useful information. However, problems in ontology creation and maintenance have resulted in expensive procedures for expanding/maintaining the ontology library available to support the growing and evolving needs of analysts. In this paper, we present a generalized and improved procedure to automatically extract deep semantic information from text resources and rapidly create semantically-rich domain ontologies while keeping the manual intervention to a minimum. We also present evaluation results for the intelligence and financial ontology libraries, semi-automatically created by our proposed methodologies using freely-available textual resources from the Web.

pdf bib
Automatic Discovery of Manner Relations and its Applications
Eduardo Blanco | Dan Moldovan
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Composition of Semantic Relations: Model and Applications
Eduardo Blanco | Hakki C. Cankaya | Dan Moldovan
Coling 2010: Posters

2008

pdf bib
Causal Relation Extraction
Eduardo Blanco | Nuria Castell | Dan Moldovan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a supervised method for the detection and extraction of Causal Relations from open domain text. First we give a brief outline of the definition of causation and how it relates to other Semantic Relations, as well as a characterization of their encoding. In this work, we only consider marked and explicit causations. Our approach first identifies the syntactic patterns that may encode a causation, then we use Machine Learning techniques to decide whether or not a pattern instance encodes a causation. We focus on the most productive pattern, a verb phrase followed by a relator and a clause, and its reverse version, a relator followed by a clause and a verb phrase. As relators we consider the words as, after, because and since. We present a set of lexical, syntactic and semantic features for the classification task, their rationale and some examples. The results obtained are discussed and the errors analyzed.

2007

pdf bib
COGEX at RTE 3
Marta Tatu | Dan Moldovan
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Automatic Discovery of Part-Whole Relations
Roxana Girju | Adriana Badulescu | Dan Moldovan
Computational Linguistics, Volume 32, Number 1, March 2006

pdf bib
Question Answering with Lexical Chains Propagating Verb Arguments
Adrian Novischi | Dan Moldovan
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Speeding Up Full Syntactic Parsing by Leveraging Partial Parsing Decisions
Elliot Glaysher | Dan Moldovan
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
A Logic-Based Semantic Approach to Recognizing Textual Entailment
Marta Tatu | Dan Moldovan
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Phramer - An Open Source Statistical Phrase-Based Translator
Marian Olteanu | Chris Davis | Ionut Volosen | Dan Moldovan
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Language Models and Reranking for Machine Translation
Marian Olteanu | Pasin Suriyentrakorn | Dan Moldovan
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf bib
PP-attachment Disambiguation using Large Context
Marian Olteanu | Dan Moldovan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
A Semantic Approach to Recognizing Textual Entailment
Marta Tatu | Dan Moldovan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
A Semantic Scattering Model for the Automatic Interpretation of Genitives
Dan Moldovan | Adriana Badulescu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Experiments with Interactive Question-Answering
Sanda Harabagiu | Andrew Hickl | John Lehmann | Dan Moldovan
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Senseval-3 logic forms: A system and possible improvements
Altaf Mohammed | Dan Moldovan | Paul Parker
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
SVM classification of FrameNet semantic roles
Dan Moldovan | Roxana Gîrju | Marian Olteanu | Ovidiu Fortu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
LCC’s WSD systems for Senseval-3
Adrian Novischi | Dan Moldovan | Paul Parker | Adriana Bădulescu | Bob Hauser
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Models for the Semantic Classification of Noun Phrases
Dan Moldovan | Adriana Badulescu | Marta Tatu | Daniel Antohe | Roxana Girju
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

pdf bib
Support Vector Machines Applied to the Classification of Semantic Relations in Nominalized Noun Phrases
Roxana Girju | Ana-Maria Giuglea | Marian Olteanu | Ovidiu Fortu | Orest Bolohan | Dan Moldovan
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

2003

pdf bib
Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations
Roxana Girju | Adriana Badulescu | Dan Moldovan
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
COGEX: A Logic Prover for Question Answering
Dan Moldovan | Christine Clark | Sanda Harabagiu | Steve Maiorano
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Discovery of Manner Relations and Their Applicability to Question Answering
Roxana Girju | Manju Putcha | Dan Moldovan
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

2002

pdf bib
Performance Issues and Error Analysis in an Open-Domain Question Answering System
Dan Moldovan | Marius Pasca | Sanda Harabagiu | Mihai Surdeanu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Lexical Chains for Question Answering
Dan Moldovan | Adrian Novischi
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Open-Domain Voice-Activated Question Answering
Sanda Harabagiu | Dan Moldovan | Joe Picone
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
Sanda Harabagiu | Dan Moldovan | Marius Pasca | Rada Mihalcea | Mihai Surdeanu | Razvan Bunsecu | Roxana Girju | Vasile Rus | Paul Morarescu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Logic Form Transformation of WordNet and its Applicability to Question Answering
Dan Moldovan | Vasile Rus
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Pattern Learning and Active Feature Selection for Word Sense Disambiguation
Rada F. Mihalcea | Dan I. Moldovan
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
The Structure and Performance of an Open-Domain Question Answering System
Dan Moldovan | Sanda Harabagiu | Marius Pasca | Rada Mihalcea | Roxana Girju | Richard Goodrum | Vasile Rus
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Domain-Specific Knowledge Acquisition from Text
Dan Moldovan | Roxana Girju | Vasile Rus
Sixth Applied Natural Language Processing Conference

pdf bib
Semantic Indexing using WordNet Senses
Rada Mihalcea | Dan Moldovan
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

1999

pdf bib
WordNet 2 - A Morphologically and Semantically Enhanced Resource
Sanda M. Harabagiu | George A. Miller | Dan I. Moldovan
SIGLEX99: Standardizing Lexical Resources

pdf bib
A Method for Word Sense Disambiguation of Unrestricted Text
Rada Mihalcea | Dan I. Moldovan
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Word Sense Disambiguation based on Semantic Density
Rada Mihalcea | Dan I. Moldovan
Usage of WordNet in Natural Language Processing Systems

1993

pdf bib
USC: Description of the SNAP System Used for MUC-5
Dan Moldovan | Seungho Cha | Minhwa Chung | Tony Gallippi | Kenneth J. Hendrickson | Jun-Tae Kim | Changhwa Lin | Chinyew Lin
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993

1992

pdf bib
USC: MUC-4 Test Results and Analysis
D. Moldovan | S. Cha | M. Chung | K. Hendrickson | J. Kim | S. Kowalski
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
USC: Description of the SNAP System Used for MUC-4
D. Moldovan | S. Chet | M. Chung | K. Hendrickson | J. Kim | S. Kowalski
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
Semantic Network Array Processor as a Massively Parallel Computing Platform for High Performance and Large-Scale Natural Language Processing
Hiroaki Kitano | Dan Moldovan
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

1991

pdf bib
Toward High Performance Machine Translation: Preliminary Results from Massively Parallel Memory-Based Translation on SNAP
Hiroaki Kitano | Dan Moldovan | Seungho Cha
Proceedings of Machine Translation Summit III: Papers

This paper describes a memory-based machine translation system developed for the Semantic Net- work Array Processor (SNAP). The goal of our work is to develop a scalable and high-performance memory-based machine translation system which utilizes the high degree of parallelism provided by the SNAP machine. We have implemented an experimental machine translation system DMSNAP as a central part of a real-time speech-to-speech dia- logue translation system. It is a SNAP version of the ΦDMDIALOG speech-to-speech translation system. Memory-based natural language processing and syntactic constraint network model has been incorporated using parallel marker-passing which is directly supported from hardware level. Experimental results demonstrate that the parsing of a sentence is done in the order of milliseconds.