Carey Priebe


2024

pdf bib
Tracking the perspectives of interacting language models
Hayden Helm | Brandon Duderstadt | Youngser Park | Carey Priebe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings.

2022

pdf bib
Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport
Kelly Marchisio | Ali Saad-Eldin | Kevin Duh | Carey Priebe | Philipp Koehn
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. In this work, we improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.

2021

pdf bib
An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces
Kelly Marchisio | Youngser Park | Ali Saad-Eldin | Anton Alyakin | Kevin Duh | Carey Priebe | Philipp Koehn
Findings of the Association for Computational Linguistics: EMNLP 2021

Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node’s graph neighborhood without assuming a linear transform, and exploits new techniques from the graph matching optimization literature. These contrasting approaches have not been compared in BLI so far. In this work, we study the behavior of Euclidean versus graph-based approaches to BLI under differing data conditions and show that they complement each other when combined. We release our code at https://github.com/kellymarchisio/euc-v-graph-bli.

2007

pdf bib
Cross-Instance Tuning of Unsupervised Document Clustering Algorithms
Damianos Karakos | Jason Eisner | Sanjeev Khudanpur | Carey Priebe
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference