Daniele Pighin

2020

pdf bib abs
Stepwise Extractive Summarization and Planning with Structured Transformers
Shashi Narayan | Joshua Maynez | Jakub Adamek | Daniele Pighin | Blaz Bratanic | Ryan McDonald
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose encoder-centric stepwise models for extractive summarization using structured transformers – HiBERT and Extended Transformers. We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. Our models are not only efficient in modeling the structure of long inputs, but they also do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. When evaluated on CNN/DailyMail extractive summarization, stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering. This also holds true for Rotowire table-to-text generation, where our models surpass previously reported metrics for content selection, planning and ordering, highlighting the strength of stepwise modeling. Amongst the two structured transformers we test, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges.

2018

pdf bib
Automatic Prediction of Discourse Connectives
Eric Malmi | Daniele Pighin | Sebastian Krause | Mikhail Kozhevnikov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Redundancy Localization for the Conversationalization of Unstructured Responses
Sebastian Krause | Mikhail Kozhevnikov | Eric Malmi | Daniele Pighin
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Conversational agents offer users a natural-language interface to accomplish tasks, entertain themselves, or access information. Informational dialogue is particularly challenging in that the agent has to hold a conversation on an open topic, and to achieve a reasonable coverage it generally needs to digest and present unstructured information from textual sources. Making responses based on such sources sound natural and fit appropriately into the conversation context is a topic of ongoing research, one of the key issues of which is preventing the agent’s responses from sounding repetitive. Targeting this issue, we propose a new task, known as redundancy localization, which aims to pinpoint semantic overlap between text passages. To help address it systematically, we formalize the task, prepare a public dataset with fine-grained redundancy labels, and propose a model utilizing a weak training signal defined over the results of a passage-retrieval system on web texts. The proposed model demonstrates superior performance compared to a state-of-the-art entailment model and yields encouraging results when applied to a real-world dialogue.

2016

pdf bib
Learning to Identify Metaphors from a Corpus of Proverbs
Gözde Özbal | Carlo Strapparava | Serra Sinem Tekiroğlu | Daniele Pighin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Revisiting Taxonomy Induction over Wikipedia
Amit Gupta | Francesco Piccinno | Mikhail Kozhevnikov | Marius Paşca | Daniele Pighin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Guided by multiple heuristics, a unified taxonomy of entities and categories is distilled from the Wikipedia category network. A comprehensive evaluation, based on the analysis of upward generalization paths, demonstrates that the taxonomy supports generalizations which are more than twice as accurate as the state of the art. The taxonomy is available at http://headstaxonomy.com.

We present a detailed analysis of a graph-based annotation strategy that we employed to annotate a corpus of 11,292 real-world English to Spanish automatic translations with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The proposed approach, inspired by previous work in Interactive Evolutionary Computation and Interactive Genetic Algorithms, results in a simpler and faster annotation process. We empirically compare the method against a traditional, explicit ranking approach, and show that the graph-based strategy: 1) is considerably faster, and 2) produces consistently more reliable annotations.

pdf bib abs
An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
Daniele Pighin | Lluís Màrquez | Jonathan May
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.

pdf bib abs
The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
Daniele Pighin | Lluís Màrquez | Lluís Formiga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a corpus consisting of 11,292 real-world English to Spanish automatic translations annotated with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The translation requests, collected through the popular translation portal http://reverso.net, provide a most variated sample of real-world machine translation (MT) usage, from complete sentences to units of one or two words, from well-formed to hardly intelligible texts, from technical documents to colloquial and slang snippets. In this paper, we present 1) a preliminary annotation experiment that we carried out to select the most appropriate quality criterion to be used for these data, 2) a graph-based methodology inspired by Interactive Genetic Algorithms to reduce the annotation effort, and 3) the outcomes of the full-scale annotation experiment, which result in a valuable and original resource for the analysis and characterization of MT-output quality.

pdf bib
The UPC Submission to the WMT 2012 Shared Task on Quality Estimation
Daniele Pighin | Meritxell González | Lluís Màrquez
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Automatic Projection of Semantic Structures: an Application to Pairwise Translation Ranking
Daniele Pighin | Lluís Màrquez
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

2010

pdf bib
On Reverse Feature Engineering of Syntactic Tree Kernels
Daniele Pighin | Alessandro Moschitti
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2009

pdf bib
Reverse Engineering of Tree Kernel Feature Spaces
Daniele Pighin | Alessandro Moschitti
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Efficient Linearization of Tree Kernel Functions
Daniele Pighin | Alessandro Moschitti
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
New Features for FrameNet - WordNet Mapping
Sara Tonelli | Daniele Pighin
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
Tree Kernels for Semantic Role Labeling
Alessandro Moschitti | Daniele Pighin | Roberto Basili
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

pdf bib
Semantic Role Labeling Systems for Arabic using Kernel Methods
Mona Diab | Alessandro Moschitti | Daniele Pighin
Proceedings of ACL-08: HLT

pdf bib abs
Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models
Mauro Cettolo | Marcello Federico | Daniele Pighin | Nicola Bertoldi
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. per-word projections of chunk labels. Both rely on n-gram models of target sequences with different granularity: single words, micro-tags, chunks. In particular, n-grams defined over syntactic chunks should model syntactic constraints coping with word-group movements. Experimental analysis and evaluation conducted on two popular Chinese-English tasks suggest that the shallow-syntax joint-translation model has potential to outperform state-of-the-art phrase-based translation, with a reasonable computational overhead.