Mihai Lintean


2015

2014

We analyze in this paper a number of data sets proposed over the last decade or so for the task of paraphrase identification. The goal of the analysis is to identify the advantages as well as shortcomings of the previously proposed data sets. Based on the analysis, we then make recommendations about how to improve the process of creating and using such data sets for evaluating in the future approaches to the task of paraphrase identification or the more general task of semantic similarity. The recommendations are meant to improve our understanding of what a paraphrase is, offer a more fair ground for comparing approaches, increase the diversity of actual linguistic phenomena that future data sets will cover, and offer ways to improve our understanding of the contributions of various modules or approaches proposed for solving the task of paraphrase identification or similar tasks.

2013

2012

The paper provides a detailed account of the First Shared Task Evaluation Challenge on Question Generation that took place in 2010. The campaign included two tasks that take text as input and produce text, i.e. questions, as output: Task A - “ Question Generation from Paragraphs and Task B - “ Question Generation from Sentences. Motivation, data sets, evaluation criteria, guidelines for judges, and results are presented for the two tasks. Lessons learned and advice for future Question Generation Shared Task Evaluation Challenges (QG-STEC) are also offered.

2011

2010