Georgiana Ifrim


2022

pdf bib
Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning
Demian Ghalandari | Chris Hokamp | Georgiana Ifrim
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sentence compression reduces the length of text by removing non-essential content while preserving important facts and grammaticality. Unsupervised objective driven methods for sentence compression can be used to create customized models without the need for ground-truth training data, while allowing flexibility in the objective function(s) that are used for learning and inference. Recent unsupervised sentence compression approaches use custom objectives to guide discrete search; however, guided search is expensive at inference time. In this work, we explore the use of reinforcement learning to train effective sentence compression models that are also fast when generating predictions. In particular, we cast the task as binary sequence labelling and fine-tune a pre-trained transformer using a simple policy gradient approach. Our approach outperforms other unsupervised models while also being more efficient at inference time.

2020

pdf bib
A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal
Demian Gholipour Ghalandari | Chris Hokamp | Nghia The Pham | John Glover | Georgiana Ifrim
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters. We build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events, with links to external source articles. We also automatically extend these source articles by looking for related articles in the Common Crawl archive. We provide a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.

pdf bib
Examining the State-of-the-Art in News Timeline Summarization
Demian Gholipour Ghalandari | Georgiana Ifrim
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Previous work on automatic news timeline summarization (TLS) leaves an unclear picture about how this task can generally be approached and how well it is currently solved. This is mostly due to the focus on individual subtasks, such as date selection and date summarization, and to the previous lack of appropriate evaluation metrics for the full TLS task. In this paper, we compare different TLS strategies using appropriate evaluation frameworks, and propose a simple and effective combination of methods that improves over the stateof-the-art on all tested benchmarks. For a more robust evaluation, we also present a new TLS dataset, which is larger and spans longer time periods than previous datasets.

2016

pdf bib
Real-time News Story Detection and Tracking with Hashtags
Gevorg Poghosyan | Georgiana Ifrim
Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)

2010

pdf bib
The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns
Lizhen Qu | Georgiana Ifrim | Gerhard Weikum
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2006

pdf bib
LEILA: Learning to Extract Information by Linguistic Analysis
Fabian M. Suchanek | Georgiana Ifrim | Gerhard Weikum
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge