Sukjun (June) Hwang (@sukjun_hwang) Twitter Tweets • TwiCopy

Sukjun (June) Hwang

@sukjun_hwang

+ Follow

ML PhD student @mldcmu advised by @_albertgu

ID: 1643125892909240320

linkhttp://sukjunhwang.github.io calendar_today04-04-2023 05:38:46

47 Tweet

236 Followers

241 Following

Albert Gu

@_albertgu

4 months ago

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

thumb_up_off_alt516

chat_bubble_outline19

repeat72

shareShare

Tri Dao

@tri_dao

4 months ago

Albert articulates really well the trade offs between transformers and SSMs. This is why I work on both

thumb_up_off_alt166

chat_bubble_outline2

repeat15

shareShare

Wentao Guo

@wentaoguo7

4 months ago

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With Ted Zadouri and Tri Dao

thumb_up_off_alt316

chat_bubble_outline11

repeat66

shareShare

Gaurav Ghosal

@gaurav_ghosal

4 months ago

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

thumb_up_off_alt57

chat_bubble_outline1

repeat23

shareShare

Albert Gu

@_albertgu

4 months ago

I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏

thumb_up_off_alt139

chat_bubble_outline5

repeat11

shareShare

Mihir Prabhudesai

@mihirp98

4 months ago

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

thumb_up_off_alt973

chat_bubble_outline122

repeat171

shareShare

Lili

@lchen915

3 months ago

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.

thumb_up_off_alt769

chat_bubble_outline16

repeat130

shareShare

Pratyush Maini

@pratyushmaini

3 months ago

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance

thumb_up_off_alt559

chat_bubble_outline18

repeat92

shareShare