Rui-Jie (Ridger) Zhu (@ridgerzhu) Twitter Tweets • TwiCopy

Rui-Jie (Ridger) Zhu

@ridgerzhu

+ Follow

Ph.D. student at UC Santa Cruz, Intern at Bytedance Seed Team, working on scalable simple idea for #LLM.

ID: 1575365180971962368

calendar_today29-09-2022 06:01:50

34 Tweet

196 Takipçi

85 Takip Edilen

Rui-Jie (Ridger) Zhu

@ridgerzhu

2 months ago

Thank you for highlighting our work!

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Rui-Jie (Ridger) Zhu

@ridgerzhu

2 months ago

Don’t like to just put the paper together too, so trying to deliver some insight🤣

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Long texts choke transformers in LLMs, and this study proves that weaving a few full attention layers into mostly linear ones keeps memories sharp without the huge cache. The team trained 72 models up to 1.3B parameters, testing 6 linear designs across several mixing ratios.

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

Most current language models think out loud, stuffing every thought into words. A typical token set holds about 40000 choices, which equals roughly 15 bits of data, just under 2 bytes. When a language model must pour every reasoning step through these tiny packets, complex

thumb_up_off_alt41

chat_bubble_outline0

repeat9

shareShare

Rui-Jie (Ridger) Zhu

@ridgerzhu

18 days ago

Nice work by Jiashuo’s team!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Ge Zhang

@gezhang86038849

18 days ago

Is text-only information enough for LLM/VLM Web Agents? 🤔 Clearly not. 🙅‍♂️ The modern web is a rich tapestry of text, images 🖼️, and videos 🎥. To truly assist us, agents need to understand it all. That's why we built MM-BrowseComp. 🌐 We're introducing MM-BrowseComp 🚀, a new

thumb_up_off_alt85

chat_bubble_outline1

repeat30

shareShare