Jingfeng Wu (@uuujingfeng) 's Twitter Profile
Jingfeng Wu

@uuujingfeng

Bsky: bsky.app/profile/uuujf.…

Postdoc @SimonsInstitute @UCBerkeley; alumnus of @JohnsHopkins @PKU1898; DL theory, opt, and stat learning.

ID: 1933510801

linkhttps://uuujf.github.io calendar_today04-10-2013 07:50:15

98 Tweet

1,1K Takipçi

1,1K Takip Edilen

Yen-Huan Li (@yenhuan_li) 's Twitter Profile Photo

==== My recommendations today ==== Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency arxiv.org/abs/2402.15926 (1/2)

Ruiqi Zhang (@ruiqizhang0614) 's Twitter Profile Photo

What’s the role of the MLP layer in a transformer block? It’s intuitive to think that the MLP component helps reduce the approximation error, and our new paper confirms this theoretically! arxiv.org/abs/2402.14951 Joint work with Jingfeng Wu Jingfeng Wu and Peter L. Bartlett

What’s the role of the MLP layer in a transformer block? It’s intuitive to think that the MLP component helps reduce the approximation error, and our new paper confirms this theoretically! 
arxiv.org/abs/2402.14951

Joint work with Jingfeng Wu <a href="/uuujingfeng/">Jingfeng Wu</a> and Peter L. Bartlett
Bin Yu (@bbiinnyyuu) 's Twitter Profile Photo

My co-author Rebecca Barter and I are thrilled to announce the online release of our MIT Press book "Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making" (vdsbook.com), an essential source for producing trustworthy data-driven results.

Weijie Su (@weijie444) 's Twitter Profile Photo

📢 #ICML2024 authors! Help improve ML peer review! 🔬📝 Check your inbox for an email titled "[ICML 2024] Author Survey" and rank your submissions. 🏆📈 Your confidential input is crucial, and won't affect decisions. 🔒✅ Survey link in email or "Author Tasks" on OpenReview.

Song Mei (@song__mei) 's Twitter Profile Photo

My group at Berkeley Stats and EECS has a postdoc opening in the theoretical (e.g., scaling laws, watermark) and empirical aspects (e.g., efficiency, safety, alignment) of LLMs or diffusion models. Send me an email with your CV if interested!

Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

Oldies but goldies: H Robbins, S Monro, A Stochastic Approximation Method, 1951. Early appearance of the stochastic gradient method, which is the workhorse of many large-scale ML methods. en.wikipedia.org/wiki/Stochasti… en.wikipedia.org/wiki/Stochasti…

Woodson Lab (@woodson_lab) 's Twitter Profile Photo

Congratulations to Dr. Yuan Lou for a fantastic thesis defense! 🎉🎉 She is an exceptional scientist and colleague, and we will miss her dearly. Best of luck at Genentech!

Congratulations to Dr. Yuan Lou for a fantastic thesis defense! 🎉🎉 She is an exceptional scientist and colleague, and we will miss her dearly. Best of luck at Genentech!
Yisong Yue (@yisongyue) 's Twitter Profile Photo

Just updated my Tips for CS Faculty Applications. Best of luck to everyone applying! yisongyue.medium.com/checklist-of-t…

Jingfeng Wu (@uuujingfeng) 's Twitter Profile Photo

Out of the 6 NeurIPS submissions I reviewed this year, 3 were withdrawn and 3 were rejected. Honestly, my experience as a reviewer has worsened more than my experience as an author.

Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

The perspective transform turns a 1D convex function into a 2D positively homogeneous convex function. Fundamental in convex analysis. At the heart of Cizsar divergences. math.univ-toulouse.fr/Archive-MIP/pu…

The perspective transform turns a 1D convex function into a 2D positively homogeneous convex function. Fundamental in convex analysis. At the heart of Cizsar divergences. math.univ-toulouse.fr/Archive-MIP/pu…
Lechao Xiao (@locchiu) 's Twitter Profile Photo

1/5. Excited to share a spicy paper, "Rethinking conventional wisdom in machine learning: from generalization to scaling", arxiv.org/pdf/2409.15156. You might love it or dislike it! NotebookLM: notebooklm.google.com/notebook/43f11… While double-descent (generalization-centric,

1/5. Excited to share a spicy paper, "Rethinking conventional wisdom in machine learning: from generalization to scaling", arxiv.org/pdf/2409.15156.  
You might love it or dislike it!  
NotebookLM: notebooklm.google.com/notebook/43f11…
While double-descent (generalization-centric,
Simons Institute for the Theory of Computing (@simonsinstitute) 's Twitter Profile Photo

Deep learning practitioners have focused their attention on an optimization regime that's "unstable and convergent"--something that's not suggested by theory when using gradient methods, says Peter Bartlett, during his Richard M. Karp Distinguished Lecture at the Simons Institute

Deep learning practitioners have focused their attention on an optimization regime that's "unstable and convergent"--something that's not suggested by theory when using gradient methods, says Peter Bartlett, during his Richard M. Karp Distinguished Lecture at the Simons Institute
Sham Kakade (@shamkakade6) 's Twitter Profile Photo

(1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization steps—until we hit CBS, beyond which

(1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization steps—until we hit CBS, beyond which
Francesco Orabona (@bremen79) 's Twitter Profile Photo

Jingfeng Wu I have not such a deep knowledge of the history of ML to be able to answer this in a definitive way, so I'll just give you my very personal point of view. I entered the ML field in the peak of the SVM era. At that time, people tended to use theory as a way to design algorithms.

Yuhang Cai (@yuhangwillcai) 's Twitter Profile Photo

We show the implicit bias of GD for generic non-homogeneous deep nets (results of such were previously limited to homogenous ones). In particular, our results cover those with residual connections and non-homogeneous activation functions. It's a joint work with Kangjie Zhou,

We show the implicit bias of GD for generic non-homogeneous deep nets (results of such were previously limited to homogenous ones). In particular, our results cover those with residual connections and non-homogeneous activation functions. It's a joint work with Kangjie Zhou,
Association for Computing Machinery (@theofficialacm) 's Twitter Profile Photo

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD