@uuujingfeng : GD with LARGE stepsize induces an oscillatory loss that may sound scary, but the oscillation eventually accelerates optimization, provably Core proof in <= 5 pages, which made me very proud of :) New paper w/ Peter Bartlett, Matus Telgarsky, Bin Yu arxiv.org/abs/24502.15926 • TwiCopy

Jingfeng Wu

@uuujingfeng

+ Follow

Bsky: bsky.app/profile/uuujf.…

Postdoc @SimonsInstitute @UCBerkeley; alumnus of @JohnsHopkins @PKU1898; DL theory, opt, and stat learning.

ID: 1933510801

linkhttps://uuujf.github.io calendar_today04-10-2013 07:50:15

98 Tweet

1,1K Takipçi

1,1K Takip Edilen

Jingfeng Wu

@uuujingfeng

a year ago

GD with LARGE stepsize induces an oscillatory loss that may sound scary, but the oscillation eventually accelerates optimization, provably Core proof in <= 5 pages, which made me very proud of :) New paper w/ Peter Bartlett, Matus Telgarsky, Bin Yu arxiv.org/abs/2402.15926

thumb_up_off_alt117

chat_bubble_outline4

repeat11

shareShare