Savvas Chamezopoulos

2024

pdf bib abs
Article Classification with Graph Neural Networks and Multigraphs
Khang Ly | Yury Kashnitsky | Savvas Chamezopoulos | Valeria Krzhizhanovskaya
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.

pdf bib abs
Overview of the DagPap24 Shared Task on Detecting Automatically Generated Scientific Paper
Savvas Chamezopoulos | Drahomira Herrmannova | Anita De Waard | Drahomira Herrmannova | Domenic Rosati | Yury Kashnitsky
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

This paper provides an overview of the 2024 ACL Scholarly Document Processing workshop shared task on the detection of automatically generated scientific papers. Unlike our previous task, which focused on the binary classification of whether scientific passages were machine-generated or not, one likely use case for text generation technology in scientific writing is to intersperse human-written text with passages of machine-generated text. We frame the detection problem as a multiclass span classification task: given an expert of text, label token spans in the text as human-written or machine-generated We shared a dataset containing excerpts from human-written papers as well as artificially generated content collected by Elsevier publishing and editorial teams. As a test set, the participants were provided with a corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents. The shared task saw 457 submissions across 28 participating teams and resulted in three published technical reports. We discuss our findings from the shared task in this overview paper.

Co-authors

Domenic Rosati 1

Venues

Fix data