Rui-Jie (Ridger) Zhu (@ridgerzhu) 's Twitter Profile
Rui-Jie (Ridger) Zhu

@ridgerzhu

Ph.D. student at UC Santa Cruz, Intern at Bytedance Seed Team, working on scalable simple idea for #LLM.

ID: 1575365180971962368

calendar_today29-09-2022 06:01:50

34 Tweet

196 Followers

85 Following

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Long texts choke transformers in LLMs, and this study proves that weaving a few full attention layers into mostly linear ones keeps memories sharp without the huge cache. The team trained 72 models up to 1.3B parameters, testing 6 linear designs across several mixing ratios.

Long texts choke transformers in LLMs, and this study proves that weaving a few full attention layers into mostly linear ones keeps memories sharp without the huge cache.

The team trained 72 models up to 1.3B parameters, testing 6 linear designs across several mixing ratios.
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Most current language models think out loud, stuffing every thought into words. A typical token set holds about 40000 choices, which equals roughly 15 bits of data, just under 2 bytes. When a language model must pour every reasoning step through these tiny packets, complex

Most current language models think out loud, stuffing every thought into words.

A typical token set holds about 40000 choices, which equals roughly 15 bits of data, just under 2 bytes.

When a language model must pour every reasoning step through these tiny packets, complex
Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

Is text-only information enough for LLM/VLM Web Agents? πŸ€” Clearly not. πŸ™…β€β™‚οΈ The modern web is a rich tapestry of text, images πŸ–ΌοΈ, and videos πŸŽ₯. To truly assist us, agents need to understand it all. That's why we built MM-BrowseComp. 🌐 We're introducing MM-BrowseComp πŸš€, a new

Is text-only information enough for LLM/VLM Web Agents? πŸ€” Clearly not. πŸ™…β€β™‚οΈ The modern web is a rich tapestry of text, images πŸ–ΌοΈ, and videos πŸŽ₯. To truly assist us, agents need to understand it all. That's why we built MM-BrowseComp. 🌐

We're introducing MM-BrowseComp πŸš€, a new