Tokenizers govern the allocation of computation. It's a waste to spend a whole token of compute predicting the "way" in "By the way". SuperBPE redirects that compute to predict more difficult tokens, leading to wins on downstream tasks!
1 year ago
4
1
0
0