Xiaoliu.x (@xiaolgo) 's Twitter Profile
Xiaoliu.x

@xiaolgo

Exploring possibilities in large language model architectures,researcher @RWKV

WIP scholar.google.com/citations?user…

ID: 102969212

calendar_today08-01-2010 12:25:53

89 Tweet

80 Followers

19 Following

leloy! (@leloykun) 's Twitter Profile Photo

(Linear) Attention Mechanisms as Test-Time Regression v1.1 I've added BlinkDL's RWKV-7 and fixed the update rule for Vanilla DeltaNet --- Note that the arrows in the part where we derive linear attention variants don't necessarily indicate generality nor a tech-tree. For

(Linear) Attention Mechanisms as Test-Time Regression

v1.1

I've added <a href="/BlinkDL_AI/">BlinkDL</a>'s RWKV-7 and fixed the update rule for Vanilla DeltaNet

---

Note that the arrows in the part where we derive linear attention variants don't necessarily indicate generality nor a tech-tree. For
Xiaoliu.x (@xiaolgo) 's Twitter Profile Photo

ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer youtu.be/GwiWDwgBsNw?si… Our ongoing research focuses on developing enhanced RNN-based attention mechanisms with increased expressivity. huggingface.co/papers/2501.15…

Xiaoliu.x (@xiaolgo) 's Twitter Profile Photo

🚀 Big news! You can now test drive the DeepSeek R1 670B AI model FOR FREE at chatmobius.com! Perfect for devs, researchers, or anyone curious about next-gen AI. Don’t miss out—go break it (politely)! 🤖🎉 #DeepSeekR1 #DeepSeekAI

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

1997: Deep Blue defeats Kasparov at chess 2016: AlphaGo masters the game of Go 2025: Stanford researchers crack Among Us Trending on alphaXiv 📈 Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL.

Xiaoliu.x (@xiaolgo) 's Twitter Profile Photo

youtu.be/S4P68ID1PzI Large Language Diffusion Models It's very delightful to read this paper,always thinking how to make diffusion adaptive to language,AHA,this mask is a masterpiece work! Looking forward to the code release.

Xiaoliu.x (@xiaolgo) 's Twitter Profile Photo

huggingface.co/spaces/RWKV-Re… Thank Hugging Face for supporting us. Here, you can test the latest RWKV-7 G1 model (0.1B-2.9B xx%), which is the only RNN-based reasoning large language model in the world.

Xiaoliu.x (@xiaolgo) 's Twitter Profile Photo

Hunyuan Fun fact: RNN architectures have many alternative designs, but stacking layers in hybrid models often diminishes the advantages of the inductive biases introduced by those hybrid models. Everyone says that aiming to compress the KV cache in hybrid models is the wrong direction.

BlinkDL (@blinkdl_ai) 's Twitter Profile Photo

RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it first 🪶 It's essentially free performance - lots of params, but can be offloaded to RAM/SSD, and simple to train and deploy🚀

RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it first 🪶 It's essentially free performance - lots of params, but can be offloaded to RAM/SSD, and simple to train and deploy🚀
BlinkDL (@blinkdl_ai) 's Twitter Profile Photo

🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1) (full essay at github.com/BlinkDL/zoology)

🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1)

(full essay at github.com/BlinkDL/zoology)
BlinkDL (@blinkdl_ai) 's Twitter Profile Photo

X.M. Agree. Please use github.com/BlinkDL/RWKV-L… as reference, and the community is building github.com/RWKV-Vibe/RWKV…