Jiawei Zhao (@jiawzhao) 's Twitter Profile
Jiawei Zhao

@jiawzhao

Research Scientist at @AIatMeta (FAIR), PhD @Caltech, Fmr Research Intern @nvidia

ID: 1174164548

linkhttp://jiaweizhao.com calendar_today13-02-2013 06:56:06

75 Tweet

788 Followers

220 Following

Beidi Chen (@beidichen) 's Twitter Profile Photo

🐷 MagicPig was developed during our efforts to create challenging reasoning tasks that showcase the true potential of long-context models—tasks that cannot be solved through simple retrieval. In addition to tackling long-context closed/open LLMs (🔥 more on this coming soon), we

Peter Tong (@tongpetersb) 's Twitter Profile Photo

This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) prediction required significant changes to the model and heavy pretraining, like Chameleon. But surprisingly, the opposite is true! In large autoregressive models,

Kaiyu Yang (@kaiyuyang4) 's Twitter Profile Photo

🚀 Excited to share our position paper: "Formal Mathematical Reasoning: A New Frontier in AI"! 🔗 arxiv.org/abs/2412.16075 LLMs like o1 & o3 have tackled hard math problems by scaling test-time compute. What's next for AI4Math? We advocate for formal mathematical reasoning,

Yuandong Tian (@tydsh) 's Twitter Profile Photo

Our Coconut work (learning continuous latent CoT) has opened sourced now. Welcome to play with it: github.com/facebookresear…

Yuandong Tian (@tydsh) 's Twitter Profile Photo

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws

🤔How to effectively build a powerful reasoning agent?

Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model.
But, It only shows half of the picture!

🚨 The O(N²)
Jiawei Zhao (@jiawzhao) 's Twitter Profile Photo

You can skip prompts that aren’t useful for the current policy during training! 🔍 Efficient prompt selection is key to scaling RL training for LLM reasoning. We are actively building algos for efficient and scalable RL training system. Stay tuned!

Jiawei Zhao (@jiawzhao) 's Twitter Profile Photo

Excited to see Logarithmic format (LNS, UE8M0 FP8) used in production by DeepSeek! LNS enables efficient multi (just addition between exponents) + great dynamic range. Our LNS-Madam optimizer, built for LNS, was proposed years ago before LLM-era - hope it shines again!

Yuandong Tian (@tydsh) 's Twitter Profile Photo

We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models

Jiawei Zhao (@jiawzhao) 's Twitter Profile Photo

⏰ Submission deadline coming up fast! (Sep 1) Working on efficient reasoning? Don’t miss the chance to share it at NeurIPS 2025!