Hayate Iso


2024

pdf bib
Less is More for Long Document Summary Evaluation by LLMs
Yunshu Wu | Hayate Iso | Pouya Pezeshkpour | Nikita Bhutani | Estevam Hruschka
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Large Language Models (LLMs) have shown promising performance in summary evaluation tasks, yet they face challenges such as high computational costs and the Lost-in-the-Middle problem where important information in the middle of long documents is often overlooked. To address these issues, this paper introduces a novel approach, Extract-then-Evaluate, which involves extracting key sentences from a long source document and then evaluating the summary by prompting LLMs. The results reveal that the proposed method not only significantly reduces evaluation costs but also exhibits a higher correlation with human evaluations. Furthermore, we provide practical recommendations for optimal document length and sentence extraction methods, contributing to the development of cost-effective yet more accurate methods for LLM-based text generation evaluation.

2023

pdf bib
Zero-shot Triplet Extraction by Template Infilling
Bosung Kim | Hayate Iso | Nikita Bhutani | Estevam Hruschka | Ndapa Nakashole | Tom Mitchell
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2022

pdf bib
Comparative Opinion Summarization via Collaborative Decoding
Hayate Iso | Xiaolan Wang | Stefanos Angelidis | Yoshihiko Suhara
Findings of the Association for Computational Linguistics: ACL 2022

Opinion summarization focuses on generating summaries that reflect popular subjective information expressed in multiple online reviews. While generated summaries offer general and concise information about a particular hotel or product, the information may be insufficient to help the user compare multiple different choices. Thus, the user may still struggle with the question “Which one should I pick?” In this paper, we propose the comparative opinion summarization task, which aims at generating two contrastive summaries and one common summary from two different candidate sets of reviews. We develop a comparative summarization framework CoCoSum, which consists of two base summarization models that jointly generate contrastive and common summaries. Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce higher-quality contrastive and common summaries than state-of-the-art opinion summarization models. The dataset and code are available at https://github.com/megagonlabs/cocosum

2021

pdf bib
Convex Aggregation for Opinion Summarization
Hayate Iso | Xiaolan Wang | Yoshihiko Suhara | Stefanos Angelidis | Wang-Chiew Tan
Findings of the Association for Computational Linguistics: EMNLP 2021

Recent advances in text autoencoders have significantly improved the quality of the latent space, which enables models to generate grammatical and consistent text from aggregated latent vectors. As a successful application of this property, unsupervised opinion summarization models generate a summary by decoding the aggregated latent vectors of inputs. More specifically, they perform the aggregation via simple average. However, little is known about how the vector aggregation step affects the generation quality. In this study, we revisit the commonly used simple average approach by examining the latent space and generated summaries. We found that text autoencoders tend to generate overly generic summaries from simply averaged latent vectors due to an unexpected L2-norm shrinkage in the aggregated latent vectors, which we refer to as summary vector degeneration. To overcome this issue, we develop a framework Coop, which searches input combinations for the latent vector aggregation using input-output word overlap. Experimental results show that Coop successfully alleviates the summary vector degeneration issue and establishes new state-of-the-art performance on two opinion summarization benchmarks. Code is available at https://github.com/megagonlabs/coop.

pdf bib
End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Shogo Ujiie | Hayate Iso | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 20th Workshop on Biomedical Language Processing

Disease name recognition and normalization is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features to address this problem. Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models. Experiments using two major datasaets demonstrate that our model achieved competitive results with strong baselines, especially for unseen concepts during training.

2020

pdf bib
Fact-based Text Editing
Hayate Iso | Chao Qiao | Hang Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a novel text editing task, referred to as fact-based text editing, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-based text editing, where each instance consists of a draft text, a revised text, and several facts represented in triples. We apply the method into two public table-to-text datasets, obtaining two new datasets consisting of 233k and 37k instances, respectively. Next, we propose a new neural network architecture for fact-based text editing, called FactEditor, which edits a draft text by referring to given facts using a buffer, a stream, and a memory. A straightforward approach to address the problem would be to employ an encoder-decoder model. Our experimental results on the two datasets show that FactEditor outperforms the encoder-decoder approach in terms of fidelity and fluency. The results also show that FactEditor conducts inference faster than the encoder-decoder approach.

2019

pdf bib
Learning to Select, Track, and Generate for Data-to-Text
Hayate Iso | Yui Uehara | Tatsuya Ishigaki | Hiroshi Noji | Eiji Aramaki | Ichiro Kobayashi | Yusuke Miyao | Naoaki Okazaki | Hiroya Takamura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our proposed model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generations. Experimental results show that our proposed model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

2017

pdf bib
Multivariate Linear Regression of Symptoms-related Tweets for Infectious Gastroenteritis Scale Estimation
Ryo Takeuchi | Hayate Iso | Kaoru Ito | Shoko Wakamiya | Eiji Aramaki
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

To date, various Twitter-based event detection systems have been proposed. Most of their targets, however, share common characteristics. They are seasonal or global events such as earthquakes and flu pandemics. In contrast, this study targets unseasonal and local disease events. Our system investigates the frequencies of disease-related words such as “nausea”,“chill”,and “diarrhea” and estimates the number of patients using regression of these word frequencies. Experiments conducted using Japanese 47 areas from January 2017 to April 2017 revealed that the detection of small and unseasonal event is extremely difficult (overall performance: 0.13). However, we found that the event scale and the detection performance show high correlation in the specified cases (in the phase of patient increasing or decreasing). The results also suggest that when 150 and more patients appear in a high population area, we can expect that our social sensors detect this outbreak. Based on these results, we can infer that social sensors can reliably detect unseasonal and local disease events under certain conditions, just as they can for seasonal or global events.

2016

pdf bib
Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction
Hayate Iso | Shoko Wakamiya | Eiji Aramaki
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Because of the increasing popularity of social media, much information has been shared on the internet, enabling social media users to understand various real world events. Particularly, social media-based infectious disease surveillance has attracted increasing attention. In this work, we specifically examine influenza: a common topic of communication on social media. The fundamental theory of this work is that several words, such as symptom words (fever, headache, etc.), appear in advance of flu epidemic occurrence. Consequently, past word occurrence can contribute to estimation of the number of current patients. To employ such forecasting words, one can first estimate the optimal time lag for each word based on their cross correlation. Then one can build a linear model consisting of word frequencies at different time points for nowcasting and for forecasting influenza epidemics. Experimentally obtained results (using 7.7 million tweets of August 2012 – January 2016), the proposed model achieved the best nowcasting performance to date (correlation ratio 0.93) and practically sufficient forecasting performance (correlation ratio 0.91 in 1-week future prediction, and correlation ratio 0.77 in 3-weeks future prediction). This report is the first of the relevant literature to describe a model enabling prediction of future epidemics using Twitter.