Discussion about this post

User's avatar
Uday Singh Saini's avatar

https://arxiv.org/abs/2507.07101? This is a recent work showing SGD isn't so bad.

Matthew's avatar

If ADAM is a bad idea, does that mean Newton's method is a bad idea, too? Assuming I understand correctly, ADAM is the first-order shadow of Newton's method arising from a diagonal approximation to the Hessian. I've always viewed Newton's method as the gold standard and SGD as the necessary compromise because Hessians are too big for models with many, many parameters.

11 more comments...

No posts

Ready for more?