Nianlong Gu


pdf bib
SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation
Nianlong Gu | Richard H.R. Hahnloser
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes. Although in many workflows these processes are serially linked, there are opportunities for natural language processing (NLP) to provide end-to-end assistive tools. We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper, taking into consideration the user-provided context and keywords. SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system that flexibly deals with addition and removal of a paper database. We provide a convenient user interface that displays the recommended papers as extractive summaries and that offers abstractively-generated citing sentences which are aligned with the provided context and which mention the chosen keyword(s). Our assistive tool for literature discovery and scientific writing is available at

pdf bib
GreedyCAS: Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information
Yingqiang Gao | Jessica Lam | Nianlong Gu | Richard Hahnloser
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The abstracts of scientific papers typically contain both premises (e.g., background and observations) and conclusions. Although conclusion sentences are highlighted in structured abstracts, in non-structured abstracts the concluding information is not explicitly marked, which makes the automatic segmentation of conclusions from scientific abstracts a challenging task. In this work, we explore Normalized Mutual Information (NMI) as a means for abstract segmentation. We consider each abstract as a recurrent cycle of sentences and place two segmentation boundaries by greedily optimizing the NMI score between the two segments, assuming that conclusions are strongly semantically linked with preceding premises. On non-structured abstracts, our proposed unsupervised approach GreedyCAS achieves the best performance across all evaluation metrics; on structured abstracts, GreedyCAS outperforms all baseline methods measured by Pk. The strong correlation of NMI to our evaluation metrics reveals the effectiveness of NMI for abstract segmentation.


pdf bib
Do Discourse Indicators Reflect the Main Arguments in Scientific Papers?
Yingqiang Gao | Nianlong Gu | Jessica Lam | Richard H.R. Hahnloser
Proceedings of the 9th Workshop on Argument Mining

In scientific papers, arguments are essential for explaining authors’ findings. As substrates of the reasoning process, arguments are often decorated with discourse indicators such as “which shows that” or “suggesting that”. However, it remains understudied whether discourse indicators by themselves can be used as an effective marker of the local argument components (LACs) in the body text that support the main claim in the abstract, i.e., the global argument. In this work, we investigate whether discourse indicators reflect the global premise and conclusion. We construct a set of regular expressions for over 100 word- and phrase-level discourse indicators and measure the alignment of LACs extracted by discourse indicators with the global arguments. We find a positive correlation between the alignment of local premises and local conclusions. However, compared to a simple textual intersection baseline, discourse indicators achieve lower ROUGE recall and have limited capability of extracting LACs relevant to the global argument; thus their role in scientific reasoning is less salient as expected.

pdf bib
MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes
Nianlong Gu | Elliott Ash | Richard Hahnloser
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer), a reinforcement-learning-based extractive summarizer enriched at each step with information on the current extraction history. When MemSum iteratively selects sentences into the summary, it considers a broad information set that would intuitively also be used by humans in this task: 1) the text content of the sentence, 2) the global text context of the rest of the document, and 3) the extraction history consisting of the set of sentences that have already been extracted. With a lightweight architecture, MemSum obtains state-of-the-art test-set performance (ROUGE) in summarizing long documents taken from PubMed, arXiv, and GovReport. Ablation studies demonstrate the importance of local, global, and history information. A human evaluation confirms the high quality and low redundancy of the generated summaries, stemming from MemSum’s awareness of extraction history.


pdf bib
Embedding-based Scientific Literature Discovery in a Text Editor Application
Onur Gökçe | Jonathan Prada | Nikola I. Nikolov | Nianlong Gu | Richard H.R. Hahnloser
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Each claim in a research paper requires all relevant prior knowledge to be discovered, assimilated, and appropriately cited. However, despite the availability of powerful search engines and sophisticated text editing software, discovering relevant papers and integrating the knowledge into a manuscript remain complex tasks associated with high cognitive load. To define comprehensive search queries requires strong motivation from authors, irrespective of their familiarity with the research field. Moreover, switching between independent applications for literature discovery, bibliography management, reading papers, and writing text burdens authors further and interrupts their creative process. Here, we present a web application that combines text editing and literature discovery in an interactive user interface. The application is equipped with a search engine that couples Boolean keyword filtering with nearest neighbor search over text embeddings, providing a discovery experience tuned to an author’s manuscript and his interests. Our application aims to take a step towards more enjoyable and effortless academic writing. The demo of the application ( and a short video tutorial ( are available online.