Yoonjoo Lee


2024

pdf bib
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models
Benjamin Newman | Yoonjoo Lee | Aakanksha Naik | Pao Siangliulue | Raymond Fok | Juho Kim | Daniel S Weld | Joseph Chee Chang | Kyle Lo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

When conducting literature reviews, scientists often create literature review tables—tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers. Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps. To enable experimentation, we address two main challenges: First, we overcome a lack of high-quality datasets to benchmark table generation by curating and releasing arxivDIGESTables, a new dataset of 2,228 literature review tables extracted from ArXiv papers that synthesize a total of 7,542 research papers. Second, to support scalable evaluation of model generations against human-authored reference tables, we develop DecontextEval, an automatic evaluation method that aligns elements of tables with the same underlying aspects despite differing surface forms. Given these tools, we evaluate LMs’ abilities to reconstruct reference tables, finding this task benefits from additional context to ground the generation (e.g. table captions, in-text references). Finally, through a human evaluation study we find that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful.

pdf bib
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Tirthankar Ghosal | Amanpreet Singh | Anita Waard | Philipp Mayr | Aakanksha Naik | Orion Weller | Yoonjoo Lee | Shannon Shen | Yanxia Qin
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

pdf bib
Overview of the Fourth Workshop on Scholarly Document Processing
Tirthankar Ghosal | Amanpreet Singh | Anita De Waard | Philipp Mayr | Aakanksha Naik | Orion Weller | Yoonjoo Lee | Zejiang Shen | Yanxia Qin
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

The workshop on Scholarly Document Processing (SDP) started in 2020 to accelerate research, inform policy and educate the public on natural language processing for scientific text. The fourth iteration of the workshop, SDP24 was held at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL24) as a hybrid event. The SDP workshop saw a great increase in interest, with 57 submissions, of which 28 were accepted. The program consisted of a research track, four invited talks and two shared tasks: 1) DAGPap24: Detecting automatically generated scientific papers and 2) Context24: Multimodal Evidence and Grounding Context Identification for Scientific Claims. The program was geared towards NLP, information extraction, information retrieval, and data mining for scholarly documents, with an emphasis on identifying and providing solutions to open challenges.

2022

pdf bib
Interactive Children’s Story Rewriting Through Parent-Children Interaction
Yoonjoo Lee | Tae Soo Kim | Minsuk Chang | Juho Kim
Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022)

Storytelling in early childhood provides significant benefits in language and literacy development, relationship building, and entertainment. To maximize these benefits, it is important to empower children with more agency. Interactive story rewriting through parent-children interaction can boost children’s agency and help build the relationship between parent and child as they collaboratively create changes to an original story. However, for children with limited proficiency in reading and writing, parents must carry out multiple tasks to guide the rewriting process, which can incur a high cognitive load. In this work, we introduce an interface design that aims to support children and parents to rewrite stories together with the help of AI techniques. We describe three design goals determined by a review of prior literature in interactive storytelling and existing educational activities. We also propose a preliminary prompt-based pipeline that uses GPT-3 to realize the design goals and enable the interface.