James Foulds
2025
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
Tao Zhang | Ziqian Zeng | Yuxiang Xiao | Huiping Zhuang | Cen Chen | James Foulds | Shimei Pan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tao Zhang | Ziqian Zeng | Yuxiang Xiao | Huiping Zhuang | Cen Chen | James Foulds | Shimei Pan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicly available. The commonly used and publicly available alignment dataset, HH-RLHF, still exhibits gender bias to some extent. There is a lack of publicly available alignment datasets specifically designed to address gender bias. Hence, we developed a new dataset named GenderAlign, aiming at mitigating a comprehensive set of gender biases in LLMs. This dataset comprises 8k single-turn dialogues, each paired with a “chosen” and a “rejected” response. Compared to the “rejected” responses, the “chosen” responses demonstrate lower levels of gender bias and higher quality. Furthermore, we categorized the gender biases in the “rejected” responses of GenderAlign into 4 principal categories. The experimental results show the effectiveness of GenderAlign in reducing gender bias in LLMs.
2019
Scalable Collapsed Inference for High-Dimensional Topic Models
Rashidul Islam | James Foulds
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Rashidul Islam | James Foulds
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
The bigger the corpus, the more topics it can potentially support. To truly make full use of massive text corpora, a topic model inference algorithm must therefore scale efficiently in 1) documents and 2) topics, while 3) achieving accurate inference. Previous methods have achieved two out of three of these criteria simultaneously, but never all three at once. In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsity to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance. We use a Monte Carlo inner loop in the online setting to approximate the collapsed variational Bayes updates in a sparse and efficient way, which we accomplish via the MetropolisHastings Walker method. We showcase our algorithm on LDA and the recently proposed mixed membership skip-gram topic model. Our method requires only amortized O(kd) computation per word token instead of O(K) operations, where the number of topics occurring for a particular document kd≪ the total number of topics in the corpus K, to converge to a high-quality solution.
2015
RELLY: Inferring Hypernym Relationships Between Relational Phrases
Adam Grycner | Gerhard Weikum | Jay Pujara | James Foulds | Lise Getoor
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Adam Grycner | Gerhard Weikum | Jay Pujara | James Foulds | Lise Getoor
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Joint Models of Disagreement and Stance in Online Debate
Dhanya Sridhar | James Foulds | Bert Huang | Lise Getoor | Marilyn Walker
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Dhanya Sridhar | James Foulds | Bert Huang | Lise Getoor | Marilyn Walker
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums
Arti Ramesh | Shachi H. Kumar | James Foulds | Lise Getoor
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Arti Ramesh | Shachi H. Kumar | James Foulds | Lise Getoor
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)