Jiawei Zhao (@jiawzhao) Twitter Tweets • TwiCopy

Jiawei Zhao

@jiawzhao

+ Follow

Research Scientist at @AIatMeta (FAIR), PhD @Caltech, Fmr Research Intern @nvidia

ID: 1174164548

linkhttp://jiaweizhao.com calendar_today13-02-2013 06:56:06

75 Tweet

788 Followers

220 Following

Beidi Chen

@beidichen

10 months ago

🐷 MagicPig was developed during our efforts to create challenging reasoning tasks that showcase the true potential of long-context models—tasks that cannot be solved through simple retrieval. In addition to tackling long-context closed/open LLMs (🔥 more on this coming soon), we

thumb_up_off_alt152

chat_bubble_outline3

repeat19

shareShare

Peter Tong

@tongpetersb

10 months ago

This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) prediction required significant changes to the model and heavy pretraining, like Chameleon. But surprisingly, the opposite is true! In large autoregressive models,

thumb_up_off_alt476

chat_bubble_outline9

repeat97

shareShare

Kaiyu Yang

@kaiyuyang4

10 months ago

🚀 Excited to share our position paper: "Formal Mathematical Reasoning: A New Frontier in AI"! 🔗 arxiv.org/abs/2412.16075 LLMs like o1 & o3 have tackled hard math problems by scaling test-time compute. What's next for AI4Math? We advocate for formal mathematical reasoning,

thumb_up_off_alt570

chat_bubble_outline19

repeat125

shareShare

Yuandong Tian

@tydsh

9 months ago

Our Coconut work (learning continuous latent CoT) has opened sourced now. Welcome to play with it: github.com/facebookresear…

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat270

shareShare

Yuandong Tian

@tydsh

9 months ago

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit

thumb_up_off_alt74

chat_bubble_outline2

repeat13

shareShare

Zechun Liu

@zechunliu

9 months ago

Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.

thumb_up_off_alt24

chat_bubble_outline0

repeat6

shareShare

Zhuang Liu

@liuzhuang1234

8 months ago

New paper - Transformers, but without normalization layers (1/n)

thumb_up_off_alt4,4K

chat_bubble_outline77

repeat615

shareShare

Infini-AI-Lab

@infiniailab

5 months ago

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

thumb_up_off_alt239

chat_bubble_outline5

repeat65

shareShare

Jiawei Zhao

@jiawzhao

4 months ago

You can skip prompts that aren’t useful for the current policy during training! 🔍 Efficient prompt selection is key to scaling RL training for LLM reasoning. We are actively building algos for efficient and scalable RL training system. Stay tuned!

thumb_up_off_alt15

chat_bubble_outline1

repeat3

shareShare

Jiawei Zhao

@jiawzhao

2 months ago

Excited to see Logarithmic format (LNS, UE8M0 FP8) used in production by DeepSeek! LNS enables efficient multi (just addition between exponents) + great dynamic range. Our LNS-Madam optimizer, built for LNS, was proposed years ago before LLM-era - hope it shines again!

thumb_up_off_alt33

chat_bubble_outline0

repeat5

shareShare

Jiawei Zhao

@jiawzhao

2 months ago

Thank you for sharing it! AK x.com/jiawzhao/statu…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Yuandong Tian

@tydsh

2 months ago

We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models

thumb_up_off_alt163

chat_bubble_outline6

repeat22

shareShare

Jiawei Zhao

@jiawzhao

2 months ago

⏰ Submission deadline coming up fast! (Sep 1) Working on efficient reasoning? Don’t miss the chance to share it at NeurIPS 2025!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare