Rain (@rainnekoneko) 's Twitter Profile
Rain

@rainnekoneko

I like rain. // Scientist @ Avey AI // pfp: reddit.com/r/cats/comment…

ID: 1923977157904101376

calendar_today18-05-2025 05:41:33

35 Tweet

220 Takipçi

63 Takip Edilen

Moe Shop (@korewamoe) 's Twitter Profile Photo

✧ NEW RELEASE ✧ my song "Fluorite" for Gakuen iDOLM@STER is out now everywhere ♡ stream here → nex-tone.link/EwejxAE7K lyrics by やぎぬまかな やぎぬまかな vocals by 七瀬つむぎ 七瀬つむぎ

✧ NEW RELEASE ✧

my song "Fluorite" for Gakuen iDOLM@STER is out now everywhere ♡

stream here → nex-tone.link/EwejxAE7K

lyrics by やぎぬまかな <a href="/ygnm_kana/">やぎぬまかな</a>
vocals by 七瀬つむぎ <a href="/tsumugi_nanase/">七瀬つむぎ</a>
Muyu He (@hemuyu0327) 's Twitter Profile Photo

Visualizing LLM basics: How do LLMs store facts in its weights? Came across a video by Grant Sanderson on how LLMs store facts in its feed-forward layers (ffwd), something that a lot fewer videos / blogs touch on compared to the famous attention layers. However, ffwd is the key

Visualizing LLM basics: 
How do LLMs store facts in its weights?

Came across a video by <a href="/3blue1brown/">Grant Sanderson</a> on how LLMs store facts in its feed-forward layers (ffwd), something that a lot fewer videos / blogs touch on compared to the famous attention layers.

However, ffwd is the key
Pedro Cuenca (@pcuenq) 's Twitter Profile Photo

Download pre-compiled, optimized kernels from the Kernel Hub! Battle tested in transformers and TGI, let us know if you use it in other PyTorch projects 🚀 huggingface.co/blog/hello-hf-…

Simo Ryu (@cloneofsimo) 's Twitter Profile Photo

Expectation: "When I get a job as ML researcher, I'm going to push infinite-ctx-length diffusion model that faithfully diffuses via second order expansion of fokker-planck equation, with adaptive guidance on control signal and efficient gradient estimation" Your dataset:

Expectation: "When I get a job as ML researcher, I'm going to push infinite-ctx-length diffusion model that faithfully diffuses via second order expansion of fokker-planck equation, with adaptive guidance on control signal and efficient gradient estimation"

Your dataset:
erogol (@erogol) 's Twitter Profile Photo

Done some midnight coding and added Avey to BlaGPT it is crazy slow and uses way more VRAM at training. I think it needs some iteration to be practical however it landed well amongst other transformer alternatives github.com/erogol/BlaGPT

bycloud (@bycloudai) 's Twitter Profile Photo

this basically replaces tokenization by pooling bytes through u-net into chunks and predict the next chunk, which would also predict multiple bytes/words at once arxiv.org/abs/2506.14761