Haoling Qiu

2025

BBN-U.Oregon’s ALERT system at GenAI Content Detection Task 3: Robust Authorship Style Representations for Cross-Domain Machine-Generated Text Detection
Hemanth Kandula | Chak Fai Li | Haoling Qiu | Damianos Karakos | Hieu Man | Thien Huu Nguyen | Brian Ulicny
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

This paper presents BBN-U.Oregon’s system, ALERT, submitted to the Shared Task 3: Cross-Domain Machine-Generated Text Detection. Our approach uses robust authorship-style representations to distinguish between human-authored and machine-generated text (MGT) across various domains. We employ an ensemble-based authorship attribution (AA) system that integrates stylistic embeddings from two complementary subsystems: one that focuses on cross-genre robustness with hard positive and negative mining strategies and another that captures nuanced semantic-lexical-authorship contrasts. This combination enhances cross-domain generalization, even under domain shifts and adversarial attacks. Evaluated on the RAID benchmark, our system demonstrates strong performance across genres and decoding strategies, with resilience against adversarial manipulation, achieving 91.8% TPR at FPR=5% on standard test sets and 82.6% on adversarial sets.

2024

pdf bib abs

Improving Authorship Privacy: Adaptive Obfuscation with the Dynamic Selection of Techniques
Hemanth Kandula | Damianos Karakos | Haoling Qiu | Brian Ulicny
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing

Authorship obfuscation, the task of rewriting text to protect the original author’s identity, is becoming increasingly important due to the rise of advanced NLP tools for authorship attribution techniques. Traditional methods for authorship obfuscation face significant challenges in balancing content preservation, fluency, and style concealment. This paper introduces a novel approach, the Obfuscation Strategy Optimizer (OSO), which dynamically selects the optimal obfuscation technique based on a combination of metrics including embedding distance, meaning similarity, and fluency. By leveraging an ensemble of language models OSO achieves superior performance in preserving the original content’s meaning and grammatical fluency while effectively concealing the author’s unique writing style. Experimental results demonstrate that the OSO outperforms existing methods and approaches the performance of larger language models. Our evaluation framework incorporates adversarial testing against state-of-the-art attribution systems to validate the robustness of the obfuscation techniques. We release our code publicly at https://github.com/BBN-E/ObfuscationStrategyOptimizer

2022

pdf bib abs

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations
Oscar Sainz | Haoling Qiu | Oier Lopez de Lacalle | Eneko Agirre | Bonan Min
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations

The current workflow for Information Extraction (IE) analysts involves the definition of the entities/relations of interest and a training corpus with annotated examples. In this demonstration we introduce a new workflow where the analyst directly verbalizes the entities/relations, which are then used by a Textual Entailment model to perform zero-shot IE. We present the design and implementation of a toolkit with a user interface, as well as experiments on four IE tasks that show that the system achieves very good performance at zero-shot learning using only 5–15 minutes per type of a user’s effort. Our demonstration system is open-sourced at https://github.com/BBN-E/ZS4IE. A demonstration video is available at https://vimeo.com/676138340.

An existing domain taxonomy for normalizing content is often assumed when discussing approaches to information extraction, yet often in real-world scenarios there is none. When one does exist, as the information needs shift, it must be continually extended. This is a slow and tedious task, and one which does not scale well. Here we propose an interactive tool that allows a taxonomy to be built or extended rapidly and with a human in the loop to control precision. We apply insights from text summarization and information extraction to reduce the search space dramatically, then leverage modern pretrained language models to perform contextualized clustering of the remaining concepts to yield candidate nodes for the user to review. We show this allows a user to consider as many as 200 taxonomy concept candidates an hour, to quickly build or extend a taxonomy to better fit information needs.

2021

pdf bib abs

ExcavatorCovid: Extracting Events and Relations from Text Corpora for Temporal and Causal Analysis for COVID-19
Bonan Min | Benjamin Rozonoyer | Haoling Qiu | Alexander Zamanian | Nianwen Xue | Jessica MacBride
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Timely responses from policy makers to mitigate the impact of the COVID-19 pandemic rely on a comprehensive grasp of events, their causes, and their impacts. These events are reported at such a speed and scale as to be overwhelming. In this paper, we present ExcavatorCovid, a machine reading system that ingests open-source text documents (e.g., news and scientific publications), extracts COVID-19 related events and relations between them, and builds a Temporal and Causal Analysis Graph (TCAG). Excavator will help government agencies alleviate the information overload, understand likely downstream effects of political and economic decisions and events related to the pandemic, and respond in a timely manner to mitigate the impact of COVID-19. We expect the utility of Excavator to outlive the COVID-19 pandemic: analysts and decision makers will be empowered by Excavator to better understand and solve complex problems in the future. A demonstration video is available at https://vimeo.com/528619007.

pdf bib abs

Factuality Assessment as Modal Dependency Parsing
Jiarui Yao | Haoling Qiu | Jin Zhao | Bonan Min | Nianwen Xue
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

As the sources of information that we consume everyday rapidly diversify, it is becoming increasingly important to develop NLP tools that help to evaluate the credibility of the information we receive. A critical step towards this goal is to determine the factuality of events in text. In this paper, we frame factuality assessment as a modal dependency parsing task that identifies the events and their sources, formally known as conceivers, and then determine the level of certainty that the sources are asserting with respect to the events. We crowdsource the first large-scale data set annotated with modal dependency structures that consists of 353 Covid-19 related news articles, 24,016 events, and 2,938 conceivers. We also develop the first modal dependency parser that jointly extracts events, conceivers and constructs the modal dependency structure of a text. We evaluate the joint model against a pipeline model and demonstrate the advantage of the joint model in conceiver extraction and modal dependency structure construction when events and conceivers are automatically extracted. We believe the dataset and the models will be a valuable resource for a whole host of NLP applications such as fact checking and rumor detection.

2020

pdf bib abs

Annotating Temporal Dependency Graphs via Crowdsourcing
Jiarui Yao | Haoling Qiu | Bonan Min | Nianwen Xue
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that achieves a good trade-off between completeness and practicality in temporal annotation. We also provide a crowdsourcing strategy to annotate TDGs, and demonstrate the feasibility of this approach with an evaluation of the quality of the annotation, and the utility of the resulting data set by training a machine learning model on this data set. The data set is publicly available.

pdf bib abs

Concept Wikification for COVID-19
Panagiotis Lymperopoulos | Haoling Qiu | Bonan Min
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

Understanding scientific articles related to COVID-19 requires broad knowledge about concepts such as symptoms, diseases and medicine. Given the very large and ever-growing scientific articles related to COVID-19, it is a daunting task even for experts to recognize the large set of concepts mentioned in these articles. In this paper, we address the problem of concept wikification for COVID-19, which is to automatically recognize mentions of concepts related to COVID-19 in text and resolve them into Wikipedia titles. We develop an approach to curate a COVID-19 concept wikification dataset by mining Wikipedia text and the associated intra-Wikipedia links. We also develop an end-to-end system for concept wikification for COVID-19. Preliminary experiments show very encouraging results. Our dataset, code and pre-trained model are available at github.com/panlybero/Covid19_wikification.

2019

pdf bib abs

Towards Machine Reading for Interventions from Humanitarian-Assistance Program Literature
Bonan Min | Yee Seng Chan | Haoling Qiu | Joshua Fasching
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Solving long-lasting problems such as food insecurity requires a comprehensive understanding of interventions applied by governments and international humanitarian assistance organizations, and their results and consequences. Towards achieving this grand goal, a crucial first step is to extract past interventions and when and where they have been applied, from hundreds of thousands of reports automatically. In this paper, we developed a corpus annotated with interventions to foster research, and developed an information extraction system for extracting interventions and their location and time from text. We demonstrate early, very encouraging results on extracting interventions.

pdf bib abs

Rapid Customization for Event Extraction
Yee Seng Chan | Joshua Fasching | Haoling Qiu | Bonan Min
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Extracting events in the form of who is involved in what at when and where from text, is one of the core information extraction tasks that has many applications such as web search and question answering. We present a system for rapidly customizing event extraction capability to find new event types (what happened) and their arguments (who, when, and where). To enable extracting events of new types, we develop a novel approach to allow a user to find, expand and filter event triggers by exploring an unannotated development corpus. The system will then generate mention level event annotation automatically and train a neural network model for finding the corresponding events. To enable extracting arguments for new event types, the system makes novel use of the ACE annotation dataset to train a generic argument attachment model for extracting Actor, Place, and Time. We demonstrate that with less than 10 minutes of human effort per event type, the system achieves good performance for 67 novel event types. Experiments also show that the generic argument attachment model performs well on the novel event types. Our system (code, UI, documentation, demonstration video) is released as open source.

Venues

Haoling Qiu

2025

2024

2022

2021

2020

2019

Co-authors

Venues