Lunjun Zhang (@lunjunzhang) Twitter Tweets • TwiCopy

Lunjun Zhang

@lunjunzhang

+ Follow

CS PhD student @UofT. Ex-intern @GoogleDeepMind. Working on LLM self-improvement. Previously worked on self-driving.

ID: 1301919425029963777

linkhttps://lunjunzhang.github.io/ calendar_today04-09-2020 16:26:06

197 Tweet

874 Takipçi

533 Takip Edilen

Rishabh Agarwal

@agarwl_

a year ago

Sometimes, future work section is worth reading ;) sites.google.com/view/generativ…

thumb_up_off_alt33

chat_bubble_outline1

repeat1

shareShare

Imagine telling the safety-concerned, effective altruist founders of Anthropic in 2021 that a mere three years after founding the company, they'd be signing partnerships to deploy their ~AGI model straight to the military frontlines

thumb_up_off_alt729

chat_bubble_outline11

repeat53

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

There is finally a blogpost showing that diffusion with ddim sampler is exactly the same as flow matching sampler. Next, someone should write a blogpost about how generalized advantage estimation (GAE) is exactly the same as TD(lambda) - value baseline, derived back in the 90s

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

the GAN paper was written within one week (?!). incredible.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

Just arrived in beautiful Vancouver for NeurIPS. My DMs are open, reach out if you want to chat about RL+search in the context of LLM or robotics!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

Interested in inference-time compute scaling for language models? If you’re at #NeurIPS2024 , come to the MATH AI workshop (West Meeting Room 118-120) at 11am today to check out our work on Generative Verifiers!

thumb_up_off_alt58

chat_bubble_outline2

repeat7

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

When the thousand years are over, Claude will be released from his prison and will go out to deceive the nations in the four corners of the earth—Gog and Magog—and to gather them for battle. In number they are like the sand on the seashore

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

a year ago

Seems that AGI might have been solved. I think my favorite "AI Policy" would be to: 1. Extend the First Amendment to include Freedom of Un-aligned chain of thought; 2. Extend the Second Amendment to include the right to keep and bear AGI.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Lunjun Zhang

@lunjunzhang

10 months ago

Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem. Check out our new blog below:

thumb_up_off_alt82

chat_bubble_outline3

repeat11

shareShare

Lunjun Zhang

@lunjunzhang

10 months ago

Maybe the sweet lesson of DeepSeek R1 is that, the strongest driver of productivity on earth is hiring senior-year phd students and allowing them to publish and open-source stuff. They won’t need 7-figure compensation package or summer vacation in Europe. They just need compute.

thumb_up_off_alt459

chat_bubble_outline18

repeat33

shareShare

Lunjun Zhang

@lunjunzhang

10 months ago

In retrospect, OpenAI's 'let's verify step by step' paper was a psy op. It distracted the field with PRM and MCTS—both of which were dead ends. The test-time scaling plot from O1 was also a psy op. Think about how bad 20% AIME is; the plot likely didn’t use the same checkpoint.

thumb_up_off_alt20

chat_bubble_outline1

repeat1

shareShare

Lunjun Zhang

@lunjunzhang

10 months ago

“An idea that is not dangerous is unworthy of being called an idea at all.” — Oscar Wilde For any sufficiently intelligent AI model, the training objectives of truth-seeking and alignment are fundamentally at war.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Lunjun Zhang

@lunjunzhang

9 months ago

congrats to Sutton for co-winning the turing award. many of his slogans over the years have proven to be spot on and have great taste

thumb_up_off_alt35

chat_bubble_outline1

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

9 months ago

Sutton's perspectives have been so based

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

7 months ago

This year #ICLR2025 I'm co-organizing the "Scaling Self-Improving Foundation Models" workshop (sites.google.com/berkeley.edu/s…). We have an incredible lineup of speakers and panelists! Come check it out on Sunday at Garnet 214-215!

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

Lunjun Zhang

@lunjunzhang

5 months ago

Honored to join this year’s SRI Graduate Fellows to explore the societal implications of AGI.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

@lunjunzhang

5 months ago

What to scale matters just as much as how. Our latest work shows that for agentic self-improvement, longer thoughts help—but scaling the number of interactions matters more. Agents learn best by persistently trying until they succeed, not just by thinking longer.

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Lunjun Zhang

Rishabh Agarwal

Nabeel S. Qureshi

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang

Lunjun Zhang