Proceedings of the Workshop on New Frontiers in Summarization

Proceedings of the Workshop on New Frontiers in Summarization Lu Wang Jackie Chi Kit Cheung Giuseppe Carenini Fei Liu September 2017

Copenhagen, Denmark

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-45 book FrontiersSummarization:2017 Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowdsourced Time-Sync Comments QingPing ChaomeiChen Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 1–11 http://www.aclweb.org/anthology/W17-4501 With the prevalence of video sharing, there are increasing demands for automatic video digestion such as highlight detection. Recently, platforms with crowdsourced time-sync video comments have emerged worldwide, providing a good opportunity for highlight detection. However, this task is non-trivial: (1) time-sync comments often lag behind their corresponding shot; (2) time-sync comments are semantically sparse and noisy; (3) to determine which shots are highlights is highly subjective. The present paper aims to tackle these challenges by proposing a framework that (1) uses concept-mapped lexical-chains for lag-calibration; (2) models video highlights based on comment intensity and combination of emotion and concept concentration of each shot; (3) summarize each detected highlight using improved SumBasic with emotion and concept mapping. Experiments on large real-world datasets show that our highlight detection method and summarization method both outperform other benchmarks with considerable margins. inproceedings ping-chen:2017:FrontiersSummarization Multimedia Summary Generation from Online Conversations: Current Approaches and Future Directions EnamulHoque GiuseppeCarenini Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 12–19 http://www.aclweb.org/anthology/W17-4502 With the proliferation of Web-based social media, asynchronous conversations have become very common for supporting online communication and collaboration. Yet the increasing volume and complexity of conversational data often make it very difficult to get insights about the discussions. We consider combining textual summary with visual representation of conversational data as a promising way of supporting the user in exploring conversations. In this paper, we report our current work on developing visual interfaces that present multimedia summary combining text and visualization for online conversations and how our solutions have been tailored for a variety of domain problems. We then discuss the key challenges and opportunities for future work in this research space. inproceedings hoque-carenini:2017:FrontiersSummarization Low-Resource Neural Headline Generation OttokarTilk TanelAlumäe Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 20–26 http://www.aclweb.org/anthology/W17-4503 Recent neural headline generation models have shown great results, but are generally trained on very large datasets. We focus our efforts on improving headline quality on smaller datasets by the means of pretraining. We propose new methods that enable pre-training all the parameters of the model and utilize all available text, resulting in improvements by up to 32.4% relative in perplexity and 2.84 points in ROUGE. inproceedings tilk-alumae:2017:FrontiersSummarization Towards Improving Abstractive Summarization via Entailment Generation RamakanthPasunuru HanGuo MohitBansal Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 27–32 http://www.aclweb.org/anthology/W17-4504 Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-to-sequence models. However, these models can still benefit from stronger natural language inference skills, since a correct summary is logically entailed by the input document, i.e., it should not contain any contradictory or unrelated information. We incorporate such knowledge into an abstractive summarization model via multi-task learning, where we share its decoder parameters with those of an entailment generation model. We achieve promising initial improvements based on multiple metrics and datasets (including a test-only setting). The domain mismatch between the entailment (captions) and summarization (news) datasets suggests that the model is learning some domain-agnostic inference skills. inproceedings pasunuru-guo-bansal:2017:FrontiersSummarization Coarse-to-Fine Attention Models for Document Summarization JeffreyLing AlexanderRush Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 33–42 http://www.aclweb.org/anthology/W17-4505 Sequence-to-sequence models with attention have been successful for a variety of NLP problems, but their speed does not scale well for tasks with long source sequences such as document summarization. We propose a novel coarse-to-fine attention model that hierarchically reads a document, using coarse attention to select top-level chunks of text and fine attention to read the words of the chosen chunks. While the computation for training standard attention models scales linearly with source sequence length, our method scales with the number of top-level chunks and can handle much longer sequences. Empirically, we find that while coarse-to-fine attention models lag behind state-of-the-art baselines, our method achieves the desired behavior of sparsely attending to subsets of the document for generation. inproceedings ling-rush:2017:FrontiersSummarization Automatic Community Creation for Abstractive Spoken Conversations Summarization KaranSingla EvgenyStepanov Ali OrkanBayer GiuseppeCarenini GiuseppeRiccardi Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 43–47 http://www.aclweb.org/anthology/W17-4506 Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora. inproceedings singla-EtAl:2017:FrontiersSummarization Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization AntoineTixier PolykarposMeladianos MichalisVazirgiannis Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 48–58 http://www.aclweb.org/anthology/W17-4507 We present a fully unsupervised, extractive text summarization system that leverages a submodularity framework introduced by past research. The framework allows summaries to be generated in a greedy way while preserving near-optimal performance guarantees. Our main contribution is the novel coverage reward term of the objective function optimized by the greedy algorithm. This component builds on the graph-of-words representation of text and the k-core decomposition algorithm to assign meaningful scores to words. We evaluate our approach on the AMI and ICSI meeting speech corpora, and on the DUC2001 news corpus. We reach state-of-the-art performance on all datasets. Results indicate that our method is particularly well-suited to the meeting domain. inproceedings tixier-meladianos-vazirgiannis:2017:FrontiersSummarization TL;DR: Mining Reddit to Learn Automatic Summarization MichaelVölske MartinPotthast ShahbazSyed BennoStein Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 59–63 http://www.aclweb.org/anthology/W17-4508 Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data. We propose a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a “TL;DR” to long posts. A case study using a large Reddit crawl yields the Webis-TLDR-17 dataset, complementing existing corpora primarily from the news genre. Our technique is likely applicable to other social media sites and general web crawls. inproceedings volske-EtAl:2017:FrontiersSummarization Topic Model Stability for Hierarchical Summarization JohnMiller KathleenMcCoy Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 64–73 http://www.aclweb.org/anthology/W17-4509 W17-4509.Attachment.zip We envisioned responsive generic hierarchical text summarization with summaries organized by section and paragraph based on hierarchical structure topic models. But we had to be sure that topic models were stable for the sampled corpora. To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a representative centroid model, and reporting stability of the centroid model. We ran stability experiments for standard corpora and a development corpus of Global Warming articles. We found flat and hierarchical structures of two levels plus the root offer stable centroid models, but hierarchical structures of three levels plus the root didn't seem stable enough for use in hierarchical summarization. inproceedings miller-mccoy:2017:FrontiersSummarization Learning to Score System Summaries for Better Content Selection Evaluation. MaximePeyrard TeresaBotschen IrynaGurevych Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 74–84 http://www.aclweb.org/anthology/W17-4510 The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool. inproceedings peyrard-botschen-gurevych:2017:FrontiersSummarization Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization DemianGholipour Ghalandari Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 85–90 http://www.aclweb.org/anthology/W17-4511 The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selecting a small number of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We observe a higher performance over the original model, on par with more complex state-of-the-art methods. inproceedings gholipourghalandari:2017:FrontiersSummarization Reader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset PijiLi LidongBing WaiLam Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 91–99 http://www.aclweb.org/anthology/W17-4512 We investigate the problem of reader-aware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointly considering news documents and reader comments. To conduct evaluation for summarization performance, we prepare a new dataset. We describe the methods for data collection, aspect annotation, and summary writing as well as scrutinizing by experts. Experimental results show that reader comments can improve the summarization performance, which also demonstrates the usefulness of the proposed dataset. inproceedings li-bing-lam:2017:FrontiersSummarization A Pilot Study of Domain Adaptation Effect for Neural Abstractive Summarization XinyuHua LuWang Proceedings of the Workshop on New Frontiers in Summarization September 2017

Copenhagen, Denmark

Association for Computational Linguistics 100–106 http://www.aclweb.org/anthology/W17-4513 We study the problem of domain adaptation for neural abstractive summarization. We make initial efforts in investigating what information can be transferred to a new domain. Experimental results on news stories and opinion articles indicate that neural summarization model benefits from pre-training based on extractive summaries. We also find that the combination of in-domain and out-of-domain setup yields better summaries when in-domain data is insufficient. Further analysis shows that, the model is capable to select salient content even trained on out-of-domain data, but requires in-domain data to capture the style for a target domain. inproceedings hua-wang:2017:FrontiersSummarization