Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres George Giannakopoulos Elena Lloret John M. Conroy Josef Steinberger Marina Litvak Peter Rankel Benoit Favre April 2017

Valencia, Spain

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-10 book MultiLing2017:2017 MultiLing 2017 Overview GeorgeGiannakopoulos JohnConroy JeffKubina Peter A.Rankel ElenaLloret JosefSteinberger MarinaLitvak BenoitFavre Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 1–6 http://www.aclweb.org/anthology/W17-1001 In this brief report we present an overview of the MultiLing 2017 effort and workshop, as implemented within EACL 2017. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. This year the scope of the workshop was widened, bringing together researchers that work on summarization across sources, languages and genres. We summarize the main tasks planned and implemented this year, the contributions received, and we also provide insights on next steps. inproceedings giannakopoulos-EtAl:2017:MultiLing2017 Decoupling Encoder and Decoder Networks for Abstractive Document Summarization YingXu Jey HanLau TimothyBaldwin TrevorCohn Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 7–11 http://www.aclweb.org/anthology/W17-1002 Abstractive document summarization seeks to automatically generate a summary for a document, based on some abstract ”understanding” of the original document. State-of-the-art techniques traditionally use attentive encoder–decoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose decoupling the encoder and decoder networks, and training them separately. We encode documents using an unsupervised document encoder, and then feed the document vector to a recurrent neural network decoder. With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time. Experiments show that the decoupled model achieves comparable performance with state-of-the-art models for in-domain documents, but less well for out-of-domain documents. inproceedings xu-EtAl:2017:MultiLing2017 Centroid-based Text Summarization through Compositionality of Word Embeddings GaetanoRossiello PierpaoloBasile GiovanniSemeraro Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 12–21 http://www.aclweb.org/anthology/W17-1003 The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks. inproceedings rossiello-basile-semeraro:2017:MultiLing2017 Query-based summarization using MDL principle MarinaLitvak NataliaVanetik Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 22–31 http://www.aclweb.org/anthology/W17-1004 Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results. inproceedings litvak-vanetik:2017:MultiLing2017 Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums LeiLi LiyuanMao MoyeChen Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 32–36 http://www.aclweb.org/anthology/W17-1005 Multiple grammatical and semantic features are adopted in content linking and argument/sentiment labeling for online forums in this paper. There are mainly two different methods for content linking. First, we utilize the deep feature obtained from Word Embedding Model in deep learning and compute sentence similarity. Second, we use multiple traditional features to locate candidate linking sentences, and then adopt a voting method to obtain the final result. LDA topic modeling is used to mine latent semantic feature and K-means clustering is implemented for argument labeling, while features from sentiment dictionaries and rule-based sentiment analysis are integrated for sentiment labeling. Experimental results have shown that our methods are valid. inproceedings li-mao-chen:2017:MultiLing2017 Ultra-Concise Multi-genre Summarisation of Web2.0: towards Intelligent Content Generation ElenaLloret EsterBoldrini PatricioMartinez-Barco ManuelPalomar Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 37–46 http://www.aclweb.org/anthology/W17-1006 The electronic Word of Mouth has become the most powerful communication channel thanks to the wide usage of the Social Media. Our research proposes an approach towards the production of automatic ultra-concise summaries from multiple Web 2.0 sources. We exploit user-generated content from reviews and microblogs in dif- ferent domains, and compile and analyse four types of ultra-concise summaries: a)positive information, b) negative information; c) both or d) objective information. The appropriateness and usefulness of our model is demonstrated by its successful results and great potential in real-life applications, thus meaning a relevant advancement of the state-of-the-art approaches. inproceedings lloret-EtAl:2017:MultiLing2017 Machine Learning Approach to Evaluate MultiLingual Summaries SamiraEllouze MaherJaoua LamiaHadrich Belguith Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres April 2017

Valencia, Spain

Association for Computational Linguistics 47–54 http://www.aclweb.org/anthology/W17-1007 The present paper introduces a new MultiLing text summary evaluation method. This method relies on machine learning approach which operates by combining multiple features to build models that predict the human score (overall responsiveness) of a new summary. We have tried several single and “ensemble learning” classifiers to build the best model. We have experimented our method in summary level evaluation where we evaluate each text summary separately. The correlation between built models and human score is better than the correlation between baselines and manual score. inproceedings ellouze-jaoua-hadrichbelguith:2017:MultiLing2017