Xiaoliu.x (@xiaolgo) Twitter Tweets • TwiCopy

Xiaoliu.x

@xiaolgo

+ Follow

Exploring possibilities in large language model architectures,researcher @RWKV

WIP scholar.google.com/citations?user…

ID: 102969212

calendar_today08-01-2010 12:25:53

89 Tweet

80 Followers

19 Following

leloy!

@leloykun

10 months ago

(Linear) Attention Mechanisms as Test-Time Regression v1.1 I've added BlinkDL's RWKV-7 and fixed the update rule for Vanilla DeltaNet --- Note that the arrows in the part where we derive linear attention variants don't necessarily indicate generality nor a tech-tree. For

(Linear) Attention Mechanisms as Test-Time Regression

v1.1

I've added <a href="/BlinkDL_AI/">BlinkDL</a>'s RWKV-7 and fixed the update rule for Vanilla DeltaNet

---

Note that the arrows in the part where we derive linear attention variants don't necessarily indicate generality nor a tech-tree. For

thumb_up_off_alt173

chat_bubble_outline3

repeat32

shareShare

Xiaoliu.x

@xiaolgo

10 months ago

ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer youtu.be/GwiWDwgBsNw?si… Our ongoing research focuses on developing enhanced RNN-based attention mechanisms with increased expressivity. huggingface.co/papers/2501.15…

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Xiaoliu.x

@xiaolgo

10 months ago

🚀 Big news! You can now test drive the DeepSeek R1 670B AI model FOR FREE at chatmobius.com! Perfect for devs, researchers, or anyone curious about next-gen AI. Don’t miss out—go break it (politely)! 🤖🎉 #DeepSeekR1 #DeepSeekAI

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

alphaXiv

@askalphaxiv

10 months ago

1997: Deep Blue defeats Kasparov at chess 2016: AlphaGo masters the game of Go 2025: Stanford researchers crack Among Us Trending on alphaXiv 📈 Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL.

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat250

shareShare

Xiaoliu.x

@xiaolgo

10 months ago

youtu.be/S4P68ID1PzI Large Language Diffusion Models It's very delightful to read this paper,always thinking how to make diffusion adaptive to language,AHA,this mask is a masterpiece work! Looking forward to the code release.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Xiaoliu.x

@xiaolgo

9 months ago

huggingface.co/spaces/RWKV-Re… Thank Hugging Face for supporting us. Here, you can test the latest RWKV-7 G1 model (0.1B-2.9B xx%), which is the only RNN-based reasoning large language model in the world.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

S. Ota

@susumuota

9 months ago

Top 30 most popular arXiv papers in the last 30 days

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Xiaoliu.x

@xiaolgo

8 months ago

Hunyuan Fun fact: RNN architectures have many alternative designs, but stacking layers in hybrid models often diminishes the advantages of the inductive biases introduced by those hybrid models. Everyone says that aiming to compress the KV cache in hybrid models is the wrong direction.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Xiaoliu.x

@xiaolgo

7 months ago

huggingface.co/spaces/RWKV-Re… Welcome feedback

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Xiaoliu.x

@xiaolgo

7 months ago

arxiv.org/abs/2504.19191 Why not?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Xiaoliu.x

@xiaolgo

7 months ago

arxiv.org/pdf/2504.21463 RWKV + MOBA = RWKV-X:ALinear Complexity Hybrid Language Model

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BlinkDL

@blinkdl_ai

6 months ago

RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it first 🪶 It's essentially free performance - lots of params, but can be offloaded to RAM/SSD, and simple to train and deploy🚀