Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile
Zhaocheng Zhu

@zhu_zhaocheng

Research Scientist @nvidia. PhD @Mila_Quebec. BSc @PKU1898. Reasoning, LLMs, ML systems. Photographer. Opinions are my own.

ID: 1425493205106057221

linkhttps://kiddozhu.github.io/ calendar_today11-08-2021 16:24:13

415 Tweet

2,2K Takipçi

351 Takip Edilen

Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

I second this. If you've heard of the mythical man-month, you know one exceptional full stack developer is more productive than an entire team with the same combined skill set.

Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

A visualization of my homepage visits over the years. It gives you a glimpse of how AI research is distributed worldwide 🌎 To avoid personal bias, I've excluded visits from the city I lived in each year. US and China have always been the dominant players in the game, with China

A visualization of my homepage visits over the years. It gives you a glimpse of how AI research is distributed worldwide 🌎 To avoid personal bias, I've excluded visits from the city I lived in each year.

US and China have always been the dominant players in the game, with China
Shengyang Sun (@ssydasheng) 's Twitter Profile Photo

I am proud to share our latest paper: Reward-aware Preference Optimization (RPO): A Unified Mathematical Framework for Model Alignment! 🚀📄 [Link: arxiv.org/pdf/2502.00203] The rapid evolution of alignment algorithms—each with different objectives, training setups, and response

Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

Life update: I will join NVIDIA as a Research Scientist. I am really grateful for the wonderful years I spent at Mila. Now, it's time to take my skills to the industry and make more impact. Goodbye, Montreal. Hello, Santa Clara. Will miss the dreamy snowy nights ❄️

Life update: I will join <a href="/nvidia/">NVIDIA</a> as a Research Scientist. I am really grateful for the wonderful years I spent at Mila. Now, it's time to take my skills to the industry and make more impact.

Goodbye, Montreal. Hello, Santa Clara. Will miss the dreamy snowy nights ❄️
Xinyu Yuan @ NeurIPS (@xinyuyuan402) 's Twitter Profile Photo

🚀 Introducing StructTokenBench, the first comprehensive benchmark for protein structure tokenization (PST), and our new method, AminoAseed, outperforming ESM3's PST across all benchmarking perspective 📄Paper: arxiv.org/pdf/2503.00089 🔹Open-source in one month! Stay tuned!

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

We are excited to release new Llama-Nemotron models. These models allow you to set reasoning ON/OFF during runtime. We also release all the post-training data under CC-BY-4! Try it now on build.nvidia.com/nvidia/llama-3… HF collection: huggingface.co/collections/nv…

We are excited to release new Llama-Nemotron models. These models allow you to set reasoning ON/OFF during runtime. We also release all the post-training data under CC-BY-4!
Try it now on build.nvidia.com/nvidia/llama-3…
HF collection: huggingface.co/collections/nv…
Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

Some basic economics and math to explain why peer review is broken: You review 5 papers, 4 hours each. At ~$20/hr (typical US PhD student rate), that is $400 of unpaid labor. For industry folks, it would be $1K+. How can we expect $1K worth of high-quality reviews for free?

Somshubra Majumdar (@haseox94) 's Twitter Profile Photo

Open Code Reasoning is our latest dataset to train SOTA code reasoning capabilities in all model sizes ! With it, even 7B Qwen can reach 51% on LiveCodeBench, 32B hits 61% with just SFT alone ! Model release soon, paper and dataset are out !

Michael Galkin (@michael_galkin) 's Twitter Profile Photo

📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”. Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it! 🧵1/10

📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”.
Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it!
🧵1/10
Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

Some hindsight from a geoguesser, though it might be very different from how a non-reasoning model makes predictions. 0. Assuming the photo is in the US given your chat history. 1. You can tell it's mid-day from the shadow. Based on the direction of the sun, it's the east coast.

Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

We just released a new post-training toolkit — flexible for development and scalable to clusters with thousands of GPUs. Give it a try and let us know what you think. More features are on the way!

Zhaocheng Zhu (@zhu_zhaocheng) 's Twitter Profile Photo

Tips for writing better LLM papers (and increasing your acceptance odds 📈): 1. Report the token cost of your method. 2. State the model size and capabilities needed to reproduce results. 3. Show how your method scales with data, compute and model size. 4. Don't train on test