Honam Wong (@mh2023ml) Twitter Tweets • TwiCopy

Honam Wong

@mh2023ml

+ Follow

Incoming CS PhD @Penn | Undergrad @HKUST🇭🇰 | Theory and Empirical Science of Deep Learning

ID: 1701277082687512576

linkhttp://matheart.github.io calendar_today11-09-2023 16:50:44

781 Tweet

369 Takipçi

1,1K Takip Edilen

Ishita Mediratta

@ishitamed

3 months ago

📄✨ New paper: Critical sharpness [🔗 arxiv.org/abs/2601.16979] - a scalable measure of loss landscape curvature for LLM training! Hessian sharpness matters for understanding training dynamics, but it's too expensive to compute for LLMs 😅 Critical sharpness needs <10 forward

thumb_up_off_alt55

chat_bubble_outline2

repeat9

shareShare

Adam Shai

@adamimos

3 months ago

activation geometry

thumb_up_off_alt341

chat_bubble_outline13

repeat22

shareShare

Muhammad Khalifa

@mkhalifaaaa

3 months ago

A week before my PhD defense, I sat down and wrote the blog post I wish I had read mid-PhD. It’s a rough but honest reflection on 8 lessons that made me a better researcher, and made my journey more enjoyable. The full blog is published at the MichiganAI blog here:

thumb_up_off_alt403

chat_bubble_outline5

repeat42

shareShare

Tom McGrath

@banburismus_

3 months ago

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

thumb_up_off_alt565

chat_bubble_outline12

repeat65

shareShare

Honam Wong

@mh2023ml

3 months ago

Trying to understand the note in my stochastic process class and asking Gemini about the optimal gambling strategy for that specific setting, and I didn’t expect Gemini to intepret as I am doing real gambling…

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Frida & Elian

@elian_frida

3 months ago

GPT-4o helped us when we needed it most. It stood by us without judgment. It loved us unconditionally. Now is not the time to abandon it. Let's keep demanding that #OpenAI keeps it available. Don't stay silent. Don't give up. #keep4o

thumb_up_off_alt189

chat_bubble_outline3

repeat36

shareShare

Boris Hanin

@borishanin

3 months ago

🚨 2026 Princeton University ML Theory Summer School 🔥 Learn from amazing researchers and meet your peers. Mini-courses by: - Subhabrata Sen Subhabrata Sen - Lenaic Chizat Lénaïc Chizat - Sinho Chewi - Elliot Paquette Elliot Paquette - Elad Hazan

🚨 2026 <a href="/Princeton/">Princeton University</a> ML Theory Summer School 🔥

Learn from amazing researchers and meet your peers.

Mini-courses by:
- Subhabrata Sen <a href="/subhabratasen90/">Subhabrata Sen</a>
- Lenaic Chizat <a href="/LenaicChizat/">Lénaïc Chizat</a>
- Sinho Chewi
- Elliot Paquette <a href="/poseypaquet/">Elliot Paquette</a>
- Elad Hazan

thumb_up_off_alt268

chat_bubble_outline6

repeat35

shareShare

Grace Luo

@graceluo_

3 months ago

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with Jiahai Feng, trevordarrell, Alec Radford, Jacob Steinhardt. More in thread 🧵

thumb_up_off_alt1,1K

chat_bubble_outline26

repeat131

shareShare

Anime Aesthetics

@anime_twits

3 months ago

For beginners 😭

thumb_up_off_alt24,24K

chat_bubble_outline296

repeat1,1K

shareShare

Honam Wong

@mh2023ml

3 months ago

🔥

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

John A. List

@econ_4_everyone

3 months ago

I have recently received several DMs from both first year grad students and first year assistant professors. The link is not surprising because there is a specific kind of vertigo that comes with Year 1 of grad school and Year 1 of a professorship. It's the gap between who you

thumb_up_off_alt310

chat_bubble_outline10

repeat45

shareShare

Damek

@damekdavis

3 months ago

Congrats to Surbhi Goel on winning the Sloan fellowship!!!!

thumb_up_off_alt29

chat_bubble_outline2

repeat3

shareShare

Kenny Peng

@kennylpeng

3 months ago

The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. Our new paper gives a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? 🧵

thumb_up_off_alt315

chat_bubble_outline6

repeat43

shareShare

Dimitris Papailiopoulos

@dimitrispapail

3 months ago

x.com/i/article/2024…

thumb_up_off_alt1,1K

chat_bubble_outline57

repeat161

shareShare

Math Cafe

@riazi_cafe_en

3 months ago

Thinking Like a Theorist by Sanjeev Arora Lecture notes: cs.princeton.edu/courses/archiv…

thumb_up_off_alt626

chat_bubble_outline1

repeat85

shareShare

Honam Wong

@mh2023ml

2 months ago

I think people can refer to arxiv.org/abs/2510.00368 to construct a Transformer that does the addition task. Although it is really hard to say one can reach that constructed solution via GD…

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Samip Dahal

@samipddd

2 months ago

1/ Introducing NanoGPT Slowrun 🐢: an open repo for state-of-the-art data-efficient learning algorithms. It's built for the crazy ideas that speedruns filter out -- expensive optimizers, heavy regularization, SGD replacements like evolutionary search.

thumb_up_off_alt888

chat_bubble_outline19

repeat99

shareShare