2024
pdf
bib
abs
Attribute or Abstain: Large Language Models as Long Document Assistants
Jan Buchmann
|
Xiao Liu
|
Iryna Gurevych
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
LLMs can help humans working with long documents, but are known to hallucinate. *Attribution* can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. Thus, a long document specific evaluation of attribution is missing. To fill this gap, we present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiments with different approaches to attribution on 5 LLMs of different sizes. We find that *citation*, i.e. response generation and evidence extraction in one step, performs best for large and fine-tuned models, while additional retrieval can help for small, prompted models. We investigate whether the “Lost in the Middle” phenomenon exists for attribution, but do not find this. We also find that evidence quality can predict response quality on datasets with simple responses, but not so for complex responses, as models struggle with providing evidence for complex claims. We release code and data for further investigation. [Link](https://github.com/UKPLab/arxiv2024-attribute-or-abstain)
pdf
bib
abs
Document Structure in Long Document Transformers
Jan Buchmann
|
Max Eichler
|
Jan-Micha Bodensohn
|
Ilia Kuznetsov
|
Iryna Gurevych
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs. Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque. Do long-document Transformer models acquire an internal representation of document structure during pre-training? How can structural information be communicated to a model after pre-training, and how does it influence downstream performance? To answer these questions, we develop a novel suite of probing tasks to assess structure-awareness of long-document Transformers, propose general-purpose structure infusion methods, and evaluate the effects of structure infusion on QASPER and Evidence Inference, two challenging long-document NLP tasks. Results on LED and LongT5 suggest that they acquire implicit understanding of document structure during pre-training, which can be further enhanced by structure infusion, leading to improved end-task performance. To foster research on the role of document structure in NLP modeling, we make our data and code publicly available.
2023
pdf
bib
abs
CARE: Collaborative AI-Assisted Reading Environment
Dennis Zyska
|
Nils Dycke
|
Jan Buchmann
|
Ilia Kuznetsov
|
Iryna Gurevych
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Recent years have seen impressive progress in AI-assisted writing, yet the developments in AI-assisted reading are lacking. We propose inline commentary as a natural vehicle for AI-based reading assistance, and present CARE: the first open integrated platform for the study of inline commentary and reading. CARE facilitates data collection for inline commentaries in a commonplace collaborative reading environment, and provides a framework for enhancing reading with NLP-based assistance, such as text classification, generation or question answering. The extensible behavioral logging allows unique insights into the reading and commenting behavior, and flexible configuration makes the platform easy to deploy in new scenarios. To evaluate CARE in action, we apply the platform in a user study dedicated to scholarly peer review. CARE facilitates the data collection and study of inline commentary in NLP, extrinsic evaluation of NLP assistance, and application prototyping. We invite the community to explore and build upon the open source implementation of CARE.Github Repository:
https://github.com/UKPLab/CAREPublic Live Demo:
https://care.ukp.informatik.tu-darmstadt.de2022
pdf
bib
abs
Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review
Ilia Kuznetsov
|
Jan Buchmann
|
Max Eichler
|
Iryna Gurevych
Computational Linguistics, Volume 48, Issue 4 - December 2022
Peer review is a key component of the publishing process in most fields of science. Increasing submission rates put a strain on reviewing quality and efficiency, motivating the development of applications to support the reviewing and editorial work. While existing NLP studies focus on the analysis of individual texts, editorial assistance often requires modeling interactions between pairs of texts—yet general frameworks and datasets to support this scenario are missing. Relationships between texts are the core object of the intertextuality theory—a family of approaches in literary studies not yet operationalized in NLP. Inspired by prior theoretical work, we propose the first intertextual model of text-based collaboration, which encompasses three major phenomena that make up a full iteration of the review–revise–and–resubmit cycle: pragmatic tagging, linking, and long-document version alignment. While peer review is used across the fields of science and publication formats, existing datasets solely focus on conference-style review in computer science. Addressing this, we instantiate our proposed model in the first annotated multidomain corpus in journal-style post-publication open peer review, and provide detailed insights into the practical aspects of intertextual annotation. Our resource is a major step toward multidomain, fine-grained applications of NLP in editorial support for peer review, and our intertextual framework paves the path for general-purpose modeling of text-based collaboration. We make our corpus, detailed annotation guidelines, and accompanying code publicly available.1
pdf
bib
abs
SemEval 2022 Task 10: Structured Sentiment Analysis
Jeremy Barnes
|
Laura Oberlaender
|
Enrica Troiano
|
Andrey Kutuzov
|
Jan Buchmann
|
Rodrigo Agerri
|
Lilja Øvrelid
|
Erik Velldal
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity. This new shared task includes two subtracks (monolingual and cross-lingual) with seven datasets available in five languages, namely Norwegian, Catalan, Basque, Spanish and English. Participants submitted their predictions on a held-out test set and were evaluated on Sentiment Graph F1 . Overall, the task received over 200 submissions from 32 participating teams. We present the results of the 15 teams that provided system descriptions and our own expanded analysis of the test predictions.