Zhaocheng Zhu (@zhu_zhaocheng) Twitter Tweets • TwiCopy

Zhaocheng Zhu

@zhu_zhaocheng

+ Follow

Research Scientist @nvidia. PhD @Mila_Quebec. BSc @PKU1898. Reasoning, LLMs, ML systems. Photographer. Opinions are my own.

ID: 1425493205106057221

linkhttps://kiddozhu.github.io/ calendar_today11-08-2021 16:24:13

415 Tweet

2,2K Takipçi

351 Takip Edilen

Zhaocheng Zhu

@zhu_zhaocheng

7 months ago

I second this. If you've heard of the mythical man-month, you know one exceptional full stack developer is more productive than an entire team with the same combined skill set.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

A visualization of my homepage visits over the years. It gives you a glimpse of how AI research is distributed worldwide 🌎 To avoid personal bias, I've excluded visits from the city I lived in each year. US and China have always been the dominant players in the game, with China

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Shengyang Sun

@ssydasheng

7 months ago

I am proud to share our latest paper: Reward-aware Preference Optimization (RPO): A Unified Mathematical Framework for Model Alignment! 🚀📄 [Link: arxiv.org/pdf/2502.00203] The rapid evolution of alignment algorithms—each with different objectives, training setups, and response

thumb_up_off_alt21

chat_bubble_outline4

repeat6

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

6 months ago

Life update: I will join NVIDIA as a Research Scientist. I am really grateful for the wonderful years I spent at Mila. Now, it's time to take my skills to the industry and make more impact. Goodbye, Montreal. Hello, Santa Clara. Will miss the dreamy snowy nights ❄️

Life update: I will join <a href="/nvidia/">NVIDIA</a> as a Research Scientist. I am really grateful for the wonderful years I spent at Mila. Now, it's time to take my skills to the industry and make more impact.

Goodbye, Montreal. Hello, Santa Clara. Will miss the dreamy snowy nights ❄️

thumb_up_off_alt364

chat_bubble_outline12

repeat6

shareShare

Xinyu Yuan @ NeurIPS

@xinyuyuan402

6 months ago

🚀 Introducing StructTokenBench, the first comprehensive benchmark for protein structure tokenization (PST), and our new method, AminoAseed, outperforming ESM3's PST across all benchmarking perspective 📄Paper: arxiv.org/pdf/2503.00089 🔹Open-source in one month! Stay tuned!

thumb_up_off_alt89

chat_bubble_outline11

repeat14

shareShare

Oleksii Kuchaiev

@kuchaev

5 months ago

We are excited to release new Llama-Nemotron models. These models allow you to set reasoning ON/OFF during runtime. We also release all the post-training data under CC-BY-4! Try it now on build.nvidia.com/nvidia/llama-3… HF collection: huggingface.co/collections/nv…

thumb_up_off_alt194

chat_bubble_outline8

repeat42

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

5 months ago

Some basic economics and math to explain why peer review is broken: You review 5 papers, 4 hours each. At ~$20/hr (typical US PhD student rate), that is $400 of unpaid labor. For industry folks, it would be $1K+. How can we expect $1K worth of high-quality reviews for free?

thumb_up_off_alt11

chat_bubble_outline3

repeat0

shareShare

Somshubra Majumdar

@haseox94

5 months ago

Open Code Reasoning is our latest dataset to train SOTA code reasoning capabilities in all model sizes ! With it, even 7B Qwen can reach 51% on LiveCodeBench, 32B hits 61% with just SFT alone ! Model release soon, paper and dataset are out !

thumb_up_off_alt78

chat_bubble_outline3

repeat6

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

5 months ago

If you are looking for an open-source reasoning model, here is the best one👇

thumb_up_off_alt21

chat_bubble_outline1

repeat2

shareShare

Michael Galkin

@michael_galkin

4 months ago

📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”. Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it! 🧵1/10

thumb_up_off_alt284

chat_bubble_outline4

repeat51

shareShare

Oleksii Kuchaiev

@kuchaev

4 months ago

Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949…

thumb_up_off_alt348

chat_bubble_outline3

repeat64

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

4 months ago

Some hindsight from a geoguesser, though it might be very different from how a non-reasoning model makes predictions. 0. Assuming the photo is in the US given your chat history. 1. You can tell it's mid-day from the shadow. Based on the direction of the sun, it's the east coast.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

3 months ago

We just released a new post-training toolkit — flexible for development and scalable to clusters with thousands of GPUs. Give it a try and let us know what you think. More features are on the way!

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

2 months ago

Tips for writing better LLM papers (and increasing your acceptance odds 📈): 1. Report the token cost of your method. 2. State the model size and capabilities needed to reproduce results. 3. Show how your method scales with data, compute and model size. 4. Don't train on test

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare