Aviv Bick (@avivbick) 's Twitter Profile
Aviv Bick

@avivbick

CS PhD student at Carnegie Mellon

ID: 1745858679654809602

linkhttps://avivbick.github.io calendar_today12-01-2024 17:22:21

44 Tweet

194 Followers

15 Following

Asher Trockman (@ashertrockman) 's Twitter Profile Photo

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then antidistillation.com might be for you! Sam Altman Elon Musk

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours?

Then antidistillation.com might be for you! <a href="/sama/">Sam Altman</a> <a href="/elonmusk/">Elon Musk</a>
Kevin Li (@kevinyli_) 's Twitter Profile Photo

At #ICLR2025 to present two recent works on reasoning distillation and efficient VLM inference with my wonderful collaborators! Excited to discuss efficient deep learning🚀, methods and architectures, and reasoning for LLMs🧠; DMs open! 👇Summary of the two works below! 1/3

Yutong (Kelly) He (@electronickale) 's Twitter Profile Photo

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

Assaf Ben Kish (@abk_tau) 's Twitter Profile Photo

New work! 🚨 Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔 And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵 Github:

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. 

It's surprising how much one can delve into, and how beautiful it can become.

With (and only thanks to) the amazing Alexandre and <a href="/BachFrancis/">Francis Bach</a> 

arxiv.org/pdf/2502.09287
Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Ricardo Buitrago (@rbuit_) 's Twitter Profile Photo

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
Albert Gu (@_albertgu) 's Twitter Profile Photo

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

I converted one of my favorite talks I've given over the past year into a blog post.

"On the Tradeoffs of SSMs and Transformers"
(or: tokens are bullshit)

In a few days, we'll release what I believe is the next major advance for architectures.
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Albert Gu (@_albertgu) 's Twitter Profile Photo

I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏

Cartesia (@cartesia_ai) 's Twitter Profile Photo

Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-in-class products are built in code. ▶️ Watch us build an advanced voice agent with background reasoning in just minutes.