Beidi Chen (@beidichen) Twitter Tweets • TwiCopy

Beidi Chen

@beidichen

+ Follow

Asst. Prof @CarnegieMellon, Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.

ID: 424387623

linkhttps://www.andrew.cmu.edu/user/beidic/ calendar_today29-11-2011 18:22:36

461 Tweet

14,14K Takipçi

375 Takip Edilen

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

thumb_up_off_alt168

chat_bubble_outline3

repeat38

shareShare

Beidi Chen

@beidichen

18 days ago

📢 Can't be more excited about this scaling law study. It reveals two important points: (1) The current Test-Time Strategies are not scalable (bottlenecked by O(N^2) memory access) w.r.p. to the nature of hardware (FLOPS grows much faster than memory bandwidth) (2) While

thumb_up_off_alt130

chat_bubble_outline1

repeat13

shareShare

Beidi Chen

@beidichen

18 days ago

😜

thumb_up_off_alt19

chat_bubble_outline0

repeat0

shareShare

Tim Dettmers

@tim_dettmers

17 days ago

You gotta love those scaling laws! Very insightful work. Also one of the few applications of sparsity that seems to work really well. Would love to see more work like this!

thumb_up_off_alt36

chat_bubble_outline0

repeat3

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

17 days ago

Kinetics: Rethinking Test-Time Scaling Laws Dense strategies like BoN and Long-CoT hit an O(N²) KV bottleneck, TTS isn’t FLOP-bound 𝘢𝘯𝘥 𝘯𝘦𝘷𝘦𝘳 𝘳𝘦𝘢𝘭𝘭𝘺 𝘸𝘢𝘴 - block top-k sparse attention cuts per-token cost - enables longer generations and more parallel trials on

thumb_up_off_alt30

chat_bubble_outline0

repeat6

shareShare

Sean Welleck

@wellecks

15 days ago

Really nice work based on inference scaling laws that account for memory accesses. Very insightful!

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Xun Huang

@xunhuang1995

15 days ago

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

thumb_up_off_alt765

chat_bubble_outline25

repeat120

shareShare

Beidi Chen

@beidichen

12 days ago

Ah finally someone studied this!!! Such a pain lol

thumb_up_off_alt38

chat_bubble_outline1

repeat2

shareShare

Aviral Kumar

@aviral_kumar2

11 days ago

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. Amrith Setlur & Matthew Yang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems.

<a href="/setlur_amrith/">Amrith Setlur</a> & <a href="/matthewyryang/">Matthew Yang</a>'s new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️

thumb_up_off_alt181

chat_bubble_outline2

repeat27

shareShare

Beidi Chen

@beidichen

8 days ago

Say hello to Multiverse — the Everything Everywhere All At Once of generative modeling. 💥 Lossless, adaptive, and gloriously parallel 🌀 Now open-sourced: multiverse4fm.github.io I was amazed how easily we could extract the intrinsic parallelism of even SOTA autoregressive

thumb_up_off_alt66

chat_bubble_outline2

repeat19

shareShare

Xinyu Yang

@xinyu2ml

8 days ago

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

thumb_up_off_alt57

chat_bubble_outline3

repeat18

shareShare

chen zhuoming

@chenzhuoming911

8 days ago

The calculation of the scaling is, unfortunately, wrong. As we discussed in a recent paper, Kinetics, arxiv.org/abs/2506.05333, the bottleneck of inference time scaling is the KV memory access, rather than FLOPs! Unless your target scenario is Ollama for a single user. (That's

thumb_up_off_alt126

chat_bubble_outline6

repeat9

shareShare

Beidi Chen

@beidichen

8 days ago

Hello MiniMax (official) exciting model but questionable claim on its better reasoning scaling than DeepSeek and Qwen. Nice try on reasoning longer to be SOTA but using flops to quantify the cost in Test-time scaling doesn’t work for hybrid model 🫣 chen zhuoming has

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Beidi Chen

@beidichen

8 days ago

What a beast 🤯

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Songlin Yang

@songlinyang4

7 days ago

Xinyu Yang will be presenting this amazing work at ASAP seminar tomorrow! Do not miss his talk

<a href="/Xinyu2ML/">Xinyu Yang</a> will be presenting this amazing work at ASAP seminar tomorrow! Do not miss his talk

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Haizhong

@haizhong_zheng

6 days ago

Rollouts are a major bottleneck in RL training for LLMs. Our new proposed RL training method, GRESO, lets RL focus on high-value prompts— large saving rollout time, and accelerating training. 🚀

thumb_up_off_alt16

chat_bubble_outline0

repeat5

shareShare

Beidi Chen

@beidichen

6 days ago

📣 Folks who’re seriously building scalable RL algorithms or systems need to pay attention to this 👇

thumb_up_off_alt117

chat_bubble_outline0

repeat16

shareShare

Beidi Chen

@beidichen

5 days ago

wow 🤩 check this out!!!

thumb_up_off_alt73

chat_bubble_outline1

repeat9

shareShare

Beidi Chen

@beidichen

3 days ago

This is cool!!!

thumb_up_off_alt30

chat_bubble_outline2

repeat3

shareShare

Beidi Chen

Gate.io

Jordan Juravsky

Beidi Chen

Beidi Chen

Tim Dettmers

𝚐𝔪𝟾𝚡𝚡𝟾

Sean Welleck

Xun Huang

Beidi Chen

Aviral Kumar

Beidi Chen

Xinyu Yang

chen zhuoming

Beidi Chen

Beidi Chen

Songlin Yang

Haizhong

Beidi Chen

Beidi Chen

Beidi Chen