Suyash Fulay
2024
On the Relationship between Truth and Political Bias in Language Models
Suyash Fulay
|
William Brannon
|
Shrestha Mohanty
|
Cassandra Overney
|
Elinor Poole-Dayan
|
Deb Roy
|
Jad Kabbara
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political bias. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e., those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions about the datasets used to represent truthfulness, potential limitations of aligning models to be both truthful and politically unbiased, and what language models capture about the relationship between truth and politics.
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings
William Brannon
|
Wonjune Kang
|
Suyash Fulay
|
Hang Jiang
|
Brandon Roy
|
Deb Roy
|
Jad Kabbara
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing
Learning on text-attributed graphs (TAGs), in which nodes are associated with one or more texts, has been the subject of much recent work. However, most approaches tend to make strong assumptions about the downstream task of interest, are reliant on hand-labeled data, or fail to equally balance the importance of both text and graph representations. In this work, we propose Contrastive Graph-Text pretraining (ConGraT), a general, self-supervised approach for jointly learning separate representations of texts and nodes in a TAG. Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP. We further propose an extension to the CLIP objective that leverages graph structure to incorporate information about inter-node similarity. Extensive experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling. Finally, we present an application of our method to community detection in social graphs, which enables finding more textually grounded communities, rather than purely graph-based ones.
Search
Co-authors
- William Brannon 2
- Deb Roy 2
- Jad Kabbara 2
- Shrestha Mohanty 1
- Cassandra Overney 1
- show all...