Jianglin Lu

2025

Foundation models learn highly transferable representations through large-scale pretraining on diverse data. An increasing body of research indicates that these representations exhibit a remarkable degree of similarity across architectures and modalities. In this survey, we investigate the representation potentials of foundation models, defined as the latent capacity of their learned representations to capture task-specific information within a single modality while also providing a transferable basis for alignment and unification across modalities. We begin by reviewing representative foundation models and the key metrics that make alignment measurable. We then synthesize empirical evidence of representation potentials from studies in vision, language, speech, multimodality, and neuroscience. The evidence suggests that foundation models often exhibit structural regularities and semantic consistencies in their representation spaces, positioning them as strong candidates for cross-modal transfer and alignment. We further analyze the key factors that foster representation potentials, discuss open questions, and highlight potential challenges.

pdf bib abs

Unequal Scientific Recognition in the Age of LLMs
Yixuan Liu | Abel Elekes | Jianglin Lu | Rodrigo Dorantes-Gilardi | Albert-Laszlo Barabasi
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) are reshaping how scientific knowledge is accessed and represented. This study evaluates the extent to which popular and frontier LLMs including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro recognize scientists, benchmarking their outputs against OpenAlex and Wikipedia. Using a dataset focusing on 100,000 physicists from OpenAlex to evaluate LLM recognition, we uncover substantial disparities: LLMs exhibit selective and inconsistent recognition patterns. Recognition correlates strongly with scholarly impact such as citations, and remains uneven across gender and geography. Women researchers, and researchers from Africa, Asia, and Latin America are significantly underrecognized. We further examine the role of training data provenance, identifying Wikipedia as a potential sources that contributes to recognition gaps. Our findings highlight how LLMs can reflect, and potentially amplify existing disparities in science, underscoring the need for more transparent and inclusive knowledge systems.

Co-authors

Hailing Wang 1

Yizhou Wang 1

Yi Xu 1

Kuo Yang 1

Venues

EMNLP1
Findings1

Fix author