Jingxuan Tu


2023

pdf bib
The Coreference under Transformation Labeling Dataset: Entity Tracking in Procedural Texts Using Event Models
Kyeongmin Rim | Jingxuan Tu | Bingyang Ye | Marc Verhagen | Eben Holderness | James Pustejovsky
Findings of the Association for Computational Linguistics: ACL 2023

We demonstrate that coreference resolution in procedural texts is significantly improved when performing transformation-based entity linking prior to coreference relation identification. When events in the text introduce changes to the state of participating entities, it is often impossible to accurately link entities in anaphoric and coreference relations without an understanding of the transformations those entities undergo. We show how adding event semantics helps to better model entity coreference. We argue that all transformation predicates, not just creation verbs, introduce a new entity into the discourse, as a kind of generalized Result Role, which is typically not textually mentioned. This allows us to model procedural texts as process graphs and to compute the coreference type for any two entities in the recipe. We present our annotation methodology and the corpus generated as well as describe experiments on coreference resolution of entity mentions under a process-oriented model of events.

pdf bib
Scalar Anaphora: Annotating Degrees of Coreference in Text
Bingyang Ye | Jingxuan Tu | James Pustejovsky
Proceedings of The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

pdf bib
Dense Paraphrasing for Textual Enrichment
Jingxuan Tu | Kyeongmin Rim | Eben Holderness | Bingyang Ye | James Pustejovsky
Proceedings of the 15th International Conference on Computational Semantics

Understanding inferences from text requires more than merely recovering surface arguments, adjuncts, or strings associated with the query terms. As humans, we interpret sentences as contextualized components of a narrative or discourse, by both filling in missing information, and reasoning about event consequences. In this paper, we define the process of rewriting a textual expression (lexeme or phrase) such that it reduces ambiguity while also making explicit the underlying semantics that is not (necessarily) expressed in the economy of sentence structure as Dense Paraphrasing (DP). We apply the DP techniques on the English procedural texts from the cooking recipe domain, and provide the scope and design of the application that involves creating a graph representation of events and generating hidden arguments through paraphrasing. We provide insights on how this DP process can enrich a source text by showing that the dense-paraphrased event graph is a good resource to large LLMs such as GPT-3 to generate reliable paraphrases; and by experimenting baselines for automaticDP generation. Finally, we demonstrate the utility of the dataset and event graph structure by providing a case study on the out-of-domain modeling and different DP prompts and GPT models for paraphrasing.

2022

pdf bib
Evaluating Retrieval for Multi-domain Scientific Publications
Nancy Ide | Keith Suderman | Jingxuan Tu | Marc Verhagen | Shanan Peters | Ian Ross | John Lawson | Andrew Borg | James Pustejovsky
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper provides an overview of the xDD/LAPPS Grid framework and provides results of evaluating the AskMe retrievalengine using the BEIR benchmark datasets. Our primary goal is to determine a solid baseline of performance to guide furtherdevelopment of our retrieval capabilities. Beyond this, we aim to dig deeper to determine when and why certain approachesperform well (or badly) on both in-domain and out-of-domain data, an issue that has to date received relatively little attention.

pdf bib
Competence-based Question Generation
Jingxuan Tu | Kyeongmin Rim | James Pustejovsky
Proceedings of the 29th International Conference on Computational Linguistics

Models of natural language understanding often rely on question answering and logical inference benchmark challenges to evaluate the performance of a system. While informative, such task-oriented evaluations do not assess the broader semantic abilities that humans have as part of their linguistic competence when speaking and interpreting language. We define competence-based (CB) question generation, and focus on queries over lexical semantic knowledge involving implicit argument and subevent structure of verbs. We present a method to generate such questions and a dataset of English cooking recipes we use for implementing the generation method. Our primary experiment shows that even large pretrained language models perform poorly on CB questions until they are provided with additional contextualized semantic information. The data and the source code is available at: https: //github.com/brandeis-llc/CompQG.

pdf bib
SemEval-2022 Task 9: R2VQ – Competence-based Multimodal Question Answering
Jingxuan Tu | Eben Holderness | Marco Maru | Simone Conia | Kyeongmin Rim | Kelley Lynch | Richard Brutti | Roberto Navigli | James Pustejovsky
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this task, we identify a challenge that is reflective of linguistic and cognitive competencies that humans have when speaking and reasoning. Particularly, given the intuition that textual and visual information mutually inform each other for semantic reasoning, we formulate a Competence-based Question Answering challenge, designed to involve rich semantic annotation and aligned text-video objects. The task is to answer questions from a collection of cooking recipes and videos, where each question belongs to a “question family” reflecting a specific reasoning competence. The data and task result is publicly available.

2021

pdf bib
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Qingyun Wang | Manling Li | Xuan Wang | Nikolaus Parulian | Guangxing Han | Jiawei Ma | Jingxuan Tu | Ying Lin | Ranran Haoran Zhang | Weili Liu | Aabhas Chauhan | Yingjun Guan | Bangzheng Li | Ruisong Li | Xiangchen Song | Yi Fung | Heng Ji | Jiawei Han | Shih-Fu Chang | James Pustejovsky | Jasmine Rah | David Liem | Ahmed ELsayed | Martha Palmer | Clare Voss | Cynthia Schneider | Boyan Onyshkevych
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. All of the data, KGs, reports.

pdf bib
Exploration and Discovery of the COVID-19 Literature through Semantic Visualization
Jingxuan Tu | Marc Verhagen | Brent Cochran | James Pustejovsky
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

We propose semantic visualization as a linguistic visual analytic method. It can enable exploration and discovery over large datasets of complex networks by exploiting the semantics of the relations in them. This involves extracting information, applying parameter reduction operations, building hierarchical data representation and designing visualization. We also present the accompanying COVID-SemViz a searchable and interactive visualization system for knowledge exploration of COVID-19 data to demonstrate the application of our proposed method. In the user studies, users found that semantic visualization-powered COVID-SemViz is helpful in terms of finding relevant information and discovering unknown associations.

pdf bib
TMR: Evaluating NER Recall on Tough Mentions
Jingxuan Tu | Constantine Lignos
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of ”tough” mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data. We demonstrate the usefulness of these metrics by evaluating corpora of English, Spanish, and Dutch using five recent neural architectures. We identify subtle differences between the performance of BERT and Flair on two English NER corpora and identify a weak spot in the performance of current models in Spanish. We conclude that the TMR metrics enable differentiation between otherwise similar-scoring systems and identification of patterns in performance that would go unnoticed from overall precision, recall, and F1.

2020

pdf bib
Reproducing Neural Ensemble Classifier for Semantic Relation Extraction inScientific Papers
Kyeongmin Rim | Jingxuan Tu | Kelley Lynch | James Pustejovsky
Proceedings of the Twelfth Language Resources and Evaluation Conference

Within the natural language processing (NLP) community, shared tasks play an important role. They define a common goal and allowthe the comparison of different methods on the same data. SemEval-2018 Task 7 involves the identification and classification of relationsin abstracts from computational linguistics (CL) publications. In this paper we describe an attempt to reproduce the methods and resultsfrom the top performing system at for SemEval-2018 Task 7. We describe challenges we encountered in the process, report on the resultsof our system, and discuss the ways that our attempt at reproduction can inform best practices.