Halil Kilicoglu

2025

Towards Knowledge-Guided Biomedical Lay Summarization using Large Language Models
Shufan Ming | Yue Guo | Halil Kilicoglu
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

The massive size, continual growth, and technical jargon in biomedical publications make it difficult for laypeople to stay informed about the latest scientific advances, motivating research on lay summarization of biomedical literature. Large language models (LLMs) are increasingly used for this task. Unlike typical automatic summarization, lay summarization requires incorporating background knowledge not found in a paper and explanations of technical jargon. This study explores the use of MeSH terms (Medical Subject Headings), which represent an article’s main topics, to enhance background information generation in biomedical lay summarization. Furthermore, we introduced a multi-turn dialogue approach that more effectively leverages MeSH terms in the instruction-tuning of LLMs to enhance the quality of lay summaries. The best model improved the state-of-the-art on the eLife test set in terms of the ROUGE-1 score by nearly 2%, with competitive scores in other metrics. These results indicate that MeSH terms can guide LLMs to generate more relevant background information for laypeople. Additionally, evaluation on a held-out dataset, one that was not used during model pre-training, shows that this capability generalizes well to unseen data, further demonstrating the effectiveness of our approach.

2024

pdf bib abs

UIUC_BioNLP at BioLaySumm: An Extract-then-Summarize Approach Augmented with Wikipedia Knowledge for Biomedical Lay Summarization
Zhiwen You | Shruthan Radhakrishna | Shufan Ming | Halil Kilicoglu
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

As the number of scientific publications is growing at a rapid pace, it is difficult for laypeople to keep track of and understand the latest scientific advances, especially in the biomedical domain. While the summarization of scientific publications has been widely studied, research on summarization targeting laypeople has remained scarce. In this study, considering the lengthy input of biomedical articles, we have developed a lay summarization system through an extract-then-summarize framework with large language models (LLMs) to summarize biomedical articles for laypeople. Using a fine-tuned GPT-3.5 model, our approach achieves the highest overall ranking and demonstrates the best relevance performance in the BioLaySumm 2024 shared task.

pdf bib abs

Multi-label Sequential Sentence Classification via Large Language Model
Mengfei Lan | Lecheng Zheng | Shufan Ming | Halil Kilicoglu
Findings of the Association for Computational Linguistics: EMNLP 2024

Sequential sentence classification (SSC) in scientific publications is crucial for supporting downstream tasks such as fine-grained information retrieval and extractive summarization. However, current SSC methods are constrained by model size, sequence length, and single-label setting. To address these limitations, this paper proposes LLM-SSC, a large language model (LLM)-based framework for both single- and multi-label SSC tasks. Unlike previous approaches that employ small- or medium-sized language models, the proposed framework utilizes LLMs to generate SSC labels through designed prompts, which enhance task understanding by incorporating demonstrations and a query to describe the prediction target. We also present a multi-label contrastive learning loss with auto-weighting scheme, enabling the multi-label classification task. To support our multi-label SSC analysis, we introduce and release a new dataset, biorc800, which mainly contains unstructured abstracts in the biomedical domain with manual annotations. Experiments demonstrate LLM-SSC’s strong performance in SSC under both in-context learning and task-specific tuning settings. We release biorc800 and our code at: https://github.com/ScienceNLP-Lab/LLM-SSC.

2023

pdf bib abs

Examining the Causal Impact of First Names on Language Models: The Case of Social Commonsense Reasoning
Sullam Jeoung | Jana Diesner | Halil Kilicoglu
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)

As language models continue to be integrated into applications of personal and societal relevance, ensuring these models’ trustworthiness is crucial, particularly with respect to producing consistent outputs regardless of sensitive attributes. Given that first names may serve as proxies for (intersectional) socio-demographic representations, it is imperative to examine the impact of first names on commonsense reasoning capabilities. In this paper, we study whether a model’s reasoning given a specific input differs based on the first names provided. Our underlying assumption is that the reasoning about Alice should not differ from the reasoning about James. We propose and implement a controlled experimental framework to measure the causal effect of first names on commonsense reasoning, enabling us to distinguish between model predictions due to chance and caused by actual factors of interest. Our results indicate that the frequency of first names has a direct effect on model prediction, with less frequent names yielding divergent predictions compared to more frequent names. To gain insights into the internal mechanisms of models that are contributing to these behaviors, we also conduct an in-depth explainable analysis. Overall, our findings suggest that to ensure model robustness, it is essential to augment datasets with more diverse first names during the configuration stage.

2021

pdf bib abs

UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for Structuring Scholarly NLP Contributions
Haoyang Liu | M. Janina Sarol | Halil Kilicoglu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction to automatically structure the scholarly contributions of NLP publications. To identify the most important contribution sentences in a paper, we used a BERT-based classifier with positional features (Subtask 1). A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences (Subtask 2). We categorized the triples into several types based on whether and how their elements were expressed in text, and addressed each type using separate BERT-based classifiers as well as rules (Subtask 3). Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation. After fixing a submission error in Pharse 1, our approach yields the best results overall. In this paper, in addition to a system description, we also provide further analysis of our results, highlighting its strengths and limitations. We make our code publicly available at https://github.com/Liu-Hy/nlp-contrib-graph.

2017

pdf bib abs

TextFlow: A Text Similarity Measure based on Continuous Sequences
Yassine Mrabet | Halil Kilicoglu | Dina Demner-Fushman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text similarity measures are used in multiple tasks such as plagiarism detection, information ranking and recognition of paraphrases and textual entailment. While recent advances in deep learning highlighted the relevance of sequential models in natural language generation, existing similarity measures do not fully exploit the sequential nature of language. Examples of such similarity measures include n-grams and skip-grams overlap which rely on distinct slices of the input texts. In this paper we present a novel text similarity measure inspired from a common representation in DNA sequence alignment algorithms. The new measure, called TextFlow, represents input text pairs as continuous curves and uses both the actual position of the words and sequence matching to compute the similarity value. Our experiments on 8 different datasets show very encouraging results in paraphrase detection, textual entailment recognition and ranking relevance.

2016

pdf bib abs

We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.

pdf bib

Inferring Implicit Causal Relationships in Biomedical Literature
Halil Kilicoglu
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib

Aligning Texts and Knowledge Bases with Semantic Sentence Simplification
Yassine Mrabet | Pavlos Vougiouklis | Halil Kilicoglu | Claire Gardent | Dina Demner-Fushman | Jonathon Hare | Elena Simperl
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

2015

pdf bib

A Compositional Interpretation of Biomedical Event Factuality
Halil Kilicoglu | Graciela Rosemblat | Michael Cairelli | Thomas Rindflesch
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

2014

pdf bib abs

Annotating Question Decomposition on Complex Medical Questions
Kirk Roberts | Kate Masterton | Marcelo Fiszman | Halil Kilicoglu | Dina Demner-Fushman
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a method for annotating question decomposition on complex medical questions. The annotations cover multiple syntactic ways that questions can be decomposed, including separating independent clauses as well as recognizing coordinations and exemplifications. We annotate a corpus of 1,467 multi-sentence consumer health questions about genetic and rare diseases. Furthermore, we label two additional medical-specific annotations: (1) background sentences are annotated with a number of medical categories such as symptoms, treatments, and family history, and (2) the central focus of the complex question (a disease) is marked. We present simple baseline results for automatic classification of these annotations, demonstrating the challenging but important nature of this task.

pdf bib

Decomposing Consumer Health Questions
Kirk Roberts | Halil Kilicoglu | Marcelo Fiszman | Dina Demner-Fushman
Proceedings of BioNLP 2014

pdf bib

Coreference Resolution for Structured Drug Product Labels
Halil Kilicoglu | Dina Demner-Fushman
Proceedings of BioNLP 2014