Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities

Peter Jansen, Bhavana Dalvi Mishra, Harsh Trivedi, Bodhisattwa Prasad Majumder, Tom Hope, Tushar Khot, Doug Downey, Eric Horvitz (Editors)

Anthology ID:: 2025.aisd-main
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Venues:: AISD | WS
Events:: Workshop on AI and Scientific Discovery (2025) | Nations of the Americas Chapter of the Association for Computational Linguistics (2025) | Other Workshops and Events (2025)
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://aclanthology.org/2025.aisd-main/
DOI:: 10.18653/v1/2025.aisd-main
ISBN:: 979-8-89176-224-4
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2025.aisd-main.pdf

PDF (full) BibTeX Search

pdf bib

pdf bib abs

Variable Extraction for Model Recovery in Scientific Literature
Chunwei Liu | Enrique Noriega-Atala | Adarsh Pyarelal | Clayton T Morrison | Mike Cafarella

Due to the increasing productivity in the scientific community, it is difficult to keep up with the literature without the assistance of AI methods. This paper evaluates various methods for extracting mathematical model variables from epidemiological studies, such as ‘infection rate (𝛼),” ‘recovery rate (𝛾),” and ‘mortality rate (𝜇).” Variable extraction appears to be a basic task, but plays a pivotal role in recovering models from scientific literature. Once extracted, we can use these variables for automatic mathematical modeling, simulation, and replication of published results. We also introduce a benchmark dataset comprising manually-annotated variable descriptions and variable values extracted from scientific papers. Our analysis shows that LLM-based solutions perform the best. Despite the incremental benefits of combining rule-based extraction outputs with LLMs, the leap in performance attributed to the transfer-learning and instruction-tuning capabilities of LLMs themselves is far more significant. This investigation demonstrates the potential of LLMs to enhance automatic comprehension of scientific artifacts and for automatic model recovery and simulation.

pdf bib abs

How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora
Nacef Ben Mansour | Hamed Rahimi | Motasem Alrahabi

Automatic keyword extraction from scientific articles is pivotal for organizing scholarly archives, powering semantic search engines, and mapping interdisciplinary research trends. However, existing methods—including statistical and graph-based approaches—struggle to handle domain-specific challenges such as technical terminology, cross-disciplinary ambiguity, and dynamic scientific jargon. This paper presents an empirical comparison of traditional keyword extraction methods (e.g. TextRank and YAKE) with approaches based on Large Language Model. We introduce a novel evaluation framework that combines fuzzy semantic matching based on Levenshtein Distance with exact-match metrics (F1, precision, recall) to address inconsistencies in keyword normalization across scientific corpora. Through an extensive ablation study across nine different LLMs, we analyze their performance and associated costs. Our findings reveal that LLM-based methods consistently achieve superior precision and relevance compared to traditional approaches. This performance advantage suggests significant potential for improving scientific search systems and information retrieval in academic contexts.

pdf bib abs

A Human-LLM Note-Taking System with Case-Based Reasoning as Framework for Scientific Discovery
Douglas B Craig

Scientific discovery is an iterative process that requires transparent reasoning, empirical validation, and structured problem-solving. This work presents a novel human-in-the-loop AI system that leverages case-based reasoning to facilitate structured scientific inquiry. The system is designed to be note-centric, using the Obsidian note-taking application as the primary interface where all components, including user inputs, system cases, and tool specifications, are represented as plain-text notes. This approach ensures that every step of the research process is visible, editable, and revisable by both the user and the AI. The system dynamically retrieves relevant cases from past experience, refines hypotheses, and structures research workflows in a transparent and iterative manner. The methodology is demonstrated through a case study investigating the role of TLR4 in sepsis, illustrating how the system supports problem framing, literature review, hypothesis formulation, and empirical validation. The results highlight the potential of AI-assisted scientific workflows to enhance research efficiency while preserving human oversight and interpretability.

pdf bib abs

We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user’s current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs.

pdf bib abs

Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications
Ethan Lin | Zhiyuan Peng | Yi Fang

Recent studies have evaluated creativity, where novelty is an important aspect, of large language models (LLMs) primarily from a semantic perspective, using benchmarks from cognitive science. However, assessing the novelty in scholarly publications, a critical facet of evaluating LLMs as scientific discovery assistants, remains underexplored, despite its potential to accelerate research cycles and prioritize high-impact contributions in scientific workflows. We introduce SchNovel, a benchmark to evaluate LLMs’ ability to assess novelty in scholarly papers, a task central to streamlining discovery pipeline. SchNovel consists of 15000 pairs of papers across six fields sampled from the arXiv dataset with publication dates spanning 2 to 10 years apart. In each pair, the more recently published paper is assumed to be more novel. Additionally, we propose RAG-Novelty, a retrieval-augmented method that mirrors human peer review by grounding novelty assessment in retrieved context. Extensive experiments provide insights into the capabilities of different LLMs to assess novelty and demonstrate that RAG-Novelty outperforms recent baseline models highlight LLMs’ promise as tools for automating novelty detection in scientific workflows.

pdf bib abs

Large Language Models (LLMs) are increasinglybeing leveraged for generating andtranslating scientific computer codes by bothdomain-experts and non-domain experts. Fortranhas served as one of the go to programminglanguages in legacy high-performance computing(HPC) for scientific discoveries. Despitegrowing adoption, LLM-based code translationof legacy code-bases has not been thoroughlyassessed or quantified for its usability.Here, we studied the applicability of LLMbasedtranslation of Fortran to C++ as a step towardsbuilding an agentic-workflow using openweightLLMs on two different computationalplatforms. We statistically quantified the compilationaccuracy of the translated C++ codes,measured the similarity of the LLM translatedcode to the human translated C++ code, andstatistically quantified the output similarity ofthe Fortran to C++ translation.

pdf bib abs

FlavorDiffusion: Modeling Food-Chemical Interactions with Diffusion
Junpyo Seo | Dongwan Kim | Jaewook Jeong | Inkyu Park | Junho Min

The study of food pairing has evolved beyond subjective expertise with the advent of machine learning. This paper presents FlavorDiffusion, a novel framework leveraging diffusion models to predict food-chemical interactions and ingredient pairings without relying on chromatography. By integrating graph-based embeddings, diffusion processes, and chemical property encoding, FlavorDiffusion addresses data imbalances and enhances clustering quality. Using a heterogeneous graph derived from datasets like Recipe1M and FlavorDB, our model demonstrates superior performance in reconstructing ingredient-ingredient relationships. The addition of a Chemical Structure Prediction (CSP) layer further refines the embedding space, achieving state-of-the-art NMI scores and enabling meaningful discovery of novel ingredient combinations. The proposed framework represents a significant step forward in computational gastronomy, offering scalable, interpretable, and chemically informed solutions for food science.