Rui-Jie (Ridger) Zhu (@ridgerzhu) 's Twitter Profile
Rui-Jie (Ridger) Zhu

@ridgerzhu

Ph.D. student at UC Santa Cruz, Intern at Bytedance Seed Team, working on scalable simple idea for #LLM.

ID: 1575365180971962368

calendar_today29-09-2022 06:01:50

34 Tweet

196 Takipçi

85 Takip Edilen

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Long texts choke transformers in LLMs, and this study proves that weaving a few full attention layers into mostly linear ones keeps memories sharp without the huge cache. The team trained 72 models up to 1.3B parameters, testing 6 linear designs across several mixing ratios.

Long texts choke transformers in LLMs, and this study proves that weaving a few full attention layers into mostly linear ones keeps memories sharp without the huge cache.

The team trained 72 models up to 1.3B parameters, testing 6 linear designs across several mixing ratios.
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Most current language models think out loud, stuffing every thought into words. A typical token set holds about 40000 choices, which equals roughly 15 bits of data, just under 2 bytes. When a language model must pour every reasoning step through these tiny packets, complex

Most current language models think out loud, stuffing every thought into words.

A typical token set holds about 40000 choices, which equals roughly 15 bits of data, just under 2 bytes.

When a language model must pour every reasoning step through these tiny packets, complex
Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

Is text-only information enough for LLM/VLM Web Agents? 🤔 Clearly not. 🙅‍♂️ The modern web is a rich tapestry of text, images 🖼️, and videos 🎥. To truly assist us, agents need to understand it all. That's why we built MM-BrowseComp. 🌐 We're introducing MM-BrowseComp 🚀, a new

Is text-only information enough for LLM/VLM Web Agents? 🤔 Clearly not. 🙅‍♂️ The modern web is a rich tapestry of text, images 🖼️, and videos 🎥. To truly assist us, agents need to understand it all. That's why we built MM-BrowseComp. 🌐

We're introducing MM-BrowseComp 🚀, a new