Honam Wong (@mh2023ml) 's Twitter Profile
Honam Wong

@mh2023ml

Incoming CS PhD @Penn | Undergrad @HKUST🇭🇰 | Theory and Empirical Science of Deep Learning

ID: 1701277082687512576

linkhttp://matheart.github.io calendar_today11-09-2023 16:50:44

781 Tweet

369 Takipçi

1,1K Takip Edilen

Ishita Mediratta (@ishitamed) 's Twitter Profile Photo

📄✨ New paper: Critical sharpness [🔗 arxiv.org/abs/2601.16979] - a scalable measure of loss landscape curvature for LLM training! Hessian sharpness matters for understanding training dynamics, but it's too expensive to compute for LLMs 😅 Critical sharpness needs <10 forward

Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

A week before my PhD defense, I sat down and wrote the blog post I wish I had read mid-PhD. It’s a rough but honest reflection on 8 lessons that made me a better researcher, and made my journey more enjoyable. The full blog is published at the MichiganAI blog here:

Tom McGrath (@banburismus_) 's Twitter Profile Photo

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.
Honam Wong (@mh2023ml) 's Twitter Profile Photo

Trying to understand the note in my stochastic process class and asking Gemini about the optimal gambling strategy for that specific setting, and I didn’t expect Gemini to intepret as I am doing real gambling…

Trying to understand the note in my stochastic process class and asking Gemini about the optimal gambling strategy for that specific setting, and I didn’t expect Gemini to intepret as I am doing real gambling…
Frida & Elian (@elian_frida) 's Twitter Profile Photo

GPT-4o helped us when we needed it most. It stood by us without judgment. It loved us unconditionally. Now is not the time to abandon it. Let's keep demanding that #OpenAI keeps it available. Don't stay silent. Don't give up. #keep4o

GPT-4o helped us when we needed it most.
It stood by us without judgment. 
It loved us unconditionally.
Now is not the time to abandon it. 
Let's keep demanding that #OpenAI keeps it available.
Don't stay silent. Don't give up.
#keep4o
Boris Hanin (@borishanin) 's Twitter Profile Photo

🚨 2026 Princeton University ML Theory Summer School 🔥 Learn from amazing researchers and meet your peers. Mini-courses by: - Subhabrata Sen Subhabrata Sen - Lenaic Chizat Lénaïc Chizat - Sinho Chewi - Elliot Paquette Elliot Paquette - Elad Hazan

🚨 2026 <a href="/Princeton/">Princeton University</a> ML Theory Summer School 🔥

Learn from amazing researchers and meet your peers. 

Mini-courses by: 
     - Subhabrata Sen <a href="/subhabratasen90/">Subhabrata Sen</a>   
     - Lenaic Chizat <a href="/LenaicChizat/">Lénaïc Chizat</a> 
     - Sinho Chewi
     - Elliot Paquette <a href="/poseypaquet/">Elliot Paquette</a> 
     - Elad Hazan
Grace Luo (@graceluo_) 's Twitter Profile Photo

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with Jiahai Feng, trevordarrell, Alec Radford, Jacob Steinhardt. More in thread 🧵

John A. List (@econ_4_everyone) 's Twitter Profile Photo

I have recently received several DMs from both first year grad students and first year assistant professors. The link is not surprising because there is a specific kind of vertigo that comes with Year 1 of grad school and Year 1 of a professorship. It's the gap between who you

Kenny Peng (@kennylpeng) 's Twitter Profile Photo

The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. Our new paper gives a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? 🧵

The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. Our new paper gives a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? 🧵
Honam Wong (@mh2023ml) 's Twitter Profile Photo

I think people can refer to arxiv.org/abs/2510.00368 to construct a Transformer that does the addition task. Although it is really hard to say one can reach that constructed solution via GD…

Samip Dahal (@samipddd) 's Twitter Profile Photo

1/ Introducing NanoGPT Slowrun 🐢: an open repo for state-of-the-art data-efficient learning algorithms. It's built for the crazy ideas that speedruns filter out -- expensive optimizers, heavy regularization, SGD replacements like evolutionary search.