Haotian Xu

2024

Although Large Language Models (LLMs) are showing impressive performance on a wide range of Natural Language Processing tasks, researchers have found that they still have limited ability to conduct induction. Recent works mainly adopt “post processes” paradigms to improve the performance of LLMs on induction (e.g., the hypothesis search & refinement methods), but their performance is still constrained by the inherent inductive capability of the LLMs. In this paper, we propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction. The ItD framework is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs. Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively. Our ablation study verifies the effectiveness of two key modules of ItD. We also verify the effectiveness of ItD across different LLMs and deductors. The data and code of this paper can be found at https://github.com/forangel2014/ItD.

pdf bib abs
Self-Knowledge Distillation for Knowledge Graph Embedding
Haotian Xu | Yuhua Wang | Jiahui Fan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Knowledge graph embedding (KGE) is an important task and it can benefit lots of downstream applications. General KGE can increase the embedding dimension to improve model performance. High-dimensional KGE will significantly increase the number of model parameters and training time. Therefore, knowledge distillation is proposed for learning a low-dimensional model from a pre-trained high-dimensional model. To avoid introducing a complex teacher model, we use self-knowledge distillation. However, there are still some issues with the self-knowledge distillation model we mentioned later. One of them is misdirection from incorrect predictions during model training. Another is the loss of discrimination information caused by excessive distillation temperature. To address these issues, we apply self-knowledge distillation, knowledge adjustment and dynamic temperature distillation to KGE. Self-knowledge distillation uses the information from the latest iteration to guide the training in the current iteration. Knowledge adjustment fixes the predictions of misjudged training samples. Dynamic temperature distillation designs dynamic sample-wise temperatures to compute soft targets. Our model can not only improve model performance but also achieve a lightweight model. Experimental results demonstrate the effectiveness and generalization ability of our model in link prediction. The lightweight model can maintain good model performance while reducing the number of model parameters and training time.

pdf bib abs
EMONA: Event-level Moral Opinions in News Articles
Yuanyuan Lei | Md Messal Monem Miah | Ayesha Qamar | Sai Ramana Reddy | Jonathan Tong | Haotian Xu | Ruihong Huang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Most previous research on moral frames has focused on social media short texts, little work has explored moral sentiment within news articles. In news articles, authors often express their opinions or political stance through moral judgment towards events, specifically whether the event is right or wrong according to social moral rules. This paper initiates a new task to understand moral opinions towards events in news articles. We have created a new dataset, EMONA, and annotated event-level moral opinions in news articles. This dataset consists of 400 news articles containing over 10k sentences and 45k events, among which 9,613 events received moral foundation labels. Extracting event morality is a challenging task, as moral judgment towards events can be very implicit. Baseline models were built for event moral identification and classification. In addition, we also conduct extrinsic evaluations to integrate event-level moral opinions into three downstream tasks. The statistical analysis and experiments show that moral opinions of events can serve as informative features for identifying ideological bias or subjective events.

2023

pdf bib abs
Triple-Hybrid Energy-based Model Makes Better Calibrated Natural Language Understanding Models
Haotian Xu | Yingying Zhang
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Though pre-trained language models achieve notable success in many applications, it’s usually controversial for over-confident predictions. Specifically, the in-distribution (ID) miscalibration and out-of-distribution (OOD) detection are main concerns. Recently, some works based on energy-based models (EBM) have shown great improvements on both ID calibration and OOD detection for images. However, it’s rarely explored in natural language understanding tasks due to the non-differentiability of text data which makes it more difficult for EBM training. In this paper, we first propose a triple-hybrid EBM which combines the benefits of classifier, conditional generative model and marginal generative model altogether. Furthermore, we leverage contrastive learning to approximately train the proposed model, which circumvents the non-differentiability issue of text data. Extensive experiments have been done on GLUE and six other multiclass datasets in various domains. Our model outperforms previous methods in terms of ID calibration and OOD detection by a large margin while maintaining competitive accuracy.

2022

pdf bib abs
Crossroads, Buildings and Neighborhoods: A Dataset for Fine-grained Location Recognition
Pei Chen | Haotian Xu | Cheng Zhang | Ruihong Huang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

General domain Named Entity Recognition (NER) datasets like CoNLL-2003 mostly annotate coarse-grained location entities such as a country or a city. But many applications require identifying fine-grained locations from texts and mapping them precisely to geographic sites, e.g., a crossroad, an apartment building, or a grocery store. In this paper, we introduce a new dataset HarveyNER with fine-grained locations annotated in tweets. This dataset presents unique challenges and characterizes many complex and long location mentions in informal descriptions. We built strong baseline models using Curriculum Learning and experimented with different heuristic curricula to better recognize difficult location mentions. Experimental results show that the simple curricula can improve the system’s performance on hard cases and its overall performance, and outperform several other baseline systems. The dataset and the baseline models can be found at https://github.com/brickee/HarveyNER.