New Study Shows AdamW Converges at O(√d / K¹⁄⁴) in L1 Norm
A new study proves the AdamW optimizer converges at O(sqrt(d)/K^{1/4}) in L1 norm, with the average gradient bound scaling as sqrt(d)·C/K^{1/4}. The same rate holds for NAdamW. Read more: getnews.me/new-study-shows-adamw-co... #adamw #nadamw #optimization
0
0
0
0