Monotonic Representation of Numeric Attributes in Language Models

Benjamin Heinzerling, Kentaro Inui


Abstract
Language models (LMs) can express factual knowledge involving numeric properties such as Karl Popper was born in 1902. However, how this information is encoded in the model’s internal representations is not understood well. Here, we introduce a method for finding and editing representations of numeric properties such as an entity’s birth year. We find directions that encode numeric properties monotonically, in an interpretable fashion. When editing representations along these directions, LM output changes accordingly. For example, by patching activations along a “birthyear” direction we can make the LM express an increasingly late birthyear. Property-encoding directions exist across several numeric properties in all models under consideration, suggesting the possibility that monotonic representation of numeric properties consistently emerges during LM pretraining.Code: https://github.com/bheinzerling/numeric-property-reprA long version of this short paper is available at: https://arxiv.org/abs/2403.10381
Anthology ID:
2024.acl-short.18
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
175–195
Language:
URL:
https://aclanthology.org/2024.acl-short.18
DOI:
Bibkey:
Cite (ACL):
Benjamin Heinzerling and Kentaro Inui. 2024. Monotonic Representation of Numeric Attributes in Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 175–195, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Monotonic Representation of Numeric Attributes in Language Models (Heinzerling & Inui, ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.18.pdf