Aviv Bick (@avivbick) Twitter Tweets • TwiCopy

Isaac Liao

@liaoisaac91893

7 months ago

Scores 4.17% on ARC-AGI 2 on Kaggle! 🔗 Code provided in the Kaggle notebook: kaggle.com/code/iliao2345…

thumb_up_off_alt147

chat_bubble_outline3

repeat21

shareShare

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then antidistillation.com might be for you! Sam Altman Elon Musk

thumb_up_off_alt139

chat_bubble_outline5

repeat29

shareShare

Kevin Li

@kevinyli_

7 months ago

At #ICLR2025 to present two recent works on reasoning distillation and efficient VLM inference with my wonderful collaborators! Excited to discuss efficient deep learning🚀, methods and architectures, and reasoning for LLMs🧠; DMs open! 👇Summary of the two works below! 1/3

thumb_up_off_alt28

chat_bubble_outline1

repeat4

shareShare

Yutong (Kelly) He

@electronickale

7 months ago

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

thumb_up_off_alt83

chat_bubble_outline2

repeat31

shareShare

Assaf Ben Kish

@abk_tau

6 months ago

New work! 🚨 Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔 And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵 Github:

thumb_up_off_alt89

chat_bubble_outline3

repeat23

shareShare

Antonio Orvieto

@orvieto_antonio

6 months ago

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

thumb_up_off_alt100

chat_bubble_outline2

repeat26

shareShare

Han Guo

@hanguo97

6 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Ricardo Buitrago

@rbuit_

5 months ago

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

thumb_up_off_alt183

chat_bubble_outline4

repeat32

shareShare

Albert Gu

@_albertgu

5 months ago

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

thumb_up_off_alt516

chat_bubble_outline19

repeat72

shareShare

Sukjun (June) Hwang

@sukjun_hwang

4 months ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

thumb_up_off_alt2,2K

chat_bubble_outline58

repeat355

shareShare

Albert Gu

@_albertgu

4 months ago

I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏

thumb_up_off_alt139

chat_bubble_outline5

repeat11

shareShare

Songlin Yang

@songlinyang4

4 months ago

Happening now!

thumb_up_off_alt49

chat_bubble_outline1

repeat5

shareShare

Songlin Yang

@songlinyang4

4 months ago

Recording: youtube.com/watch?v=aNgg6M…

thumb_up_off_alt47

chat_bubble_outline0

repeat4

shareShare

Cartesia

@cartesia_ai

3 months ago

Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-in-class products are built in code. ▶️ Watch us build an advanced voice agent with background reasoning in just minutes.

thumb_up_off_alt259

chat_bubble_outline18

repeat48

shareShare

Aviv Bick

Isaac Liao

Asher Trockman

Kevin Li

Yutong (Kelly) He

Assaf Ben Kish

Antonio Orvieto

Han Guo

Ricardo Buitrago

Albert Gu

Sukjun (June) Hwang

Albert Gu

Songlin Yang

Songlin Yang

Cartesia