Advertisement · 728 × 90
#
Hashtag

#NumberTokenLoss

Advertisement · 728 × 90
Post image

In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )

1 1 1 0