Kilian Haefeli @ ICLR (@khshind) 's Twitter Profile
Kilian Haefeli @ ICLR

@khshind

Training large models at @cohere and Deep Learning @ETH | Previously: @Aleph__Alpha, @Logitech and @UofT

ID: 739381716

linkhttps://kilianhae.github.io calendar_today05-08-2012 22:49:31

174 Tweet

384 Takipçi

626 Takip Edilen

Dylan Patel ✈️ ICLR (@dylan522p) 's Twitter Profile Photo

This pic goes so hard with the military and intelligence community in the crowd at the Citadel Military College This event is sick so much happening here I asked for permission to tweet this pic

This pic goes so hard with the military and intelligence community in the crowd at the Citadel Military College
This event is sick so much happening here
I asked for permission to tweet this pic
tenderizzation (@tenderizzation) 's Twitter Profile Photo

I don't even see the nn.Modules anymore. All I see is memory-bound, memory-bound, compute-bound, memory-bound, host-overhead bound, register spill fiasco, memory-bound, shared memory bank conflict galore, compute-bound

I don't even see the nn.Modules anymore. All I see is memory-bound, memory-bound, compute-bound, memory-bound, host-overhead bound, register spill fiasco, memory-bound, shared memory bank conflict galore, compute-bound
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

A major mistake I made in my undergrad is that I focused way too much on mathematical lens of computing - computability, decidability, asymptotic complexity etc. And too little on physical lens - energy/heat of state change, data locality, parallelism, computer architecture. The

Prime Intellect (@primeintellect) 's Twitter Profile Photo

Releasing INTELLECT-2: We’re open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: • Detailed Technical Report • INTELLECT-2 model checkpoint primeintellect.ai/blog/intellect…

Hieu Pham (@hyhieu226) 's Twitter Profile Photo

FP4 numbers are like quantum systems. You know that they exist and have one of those 16 values, but the moment you need to observe them, they immediately attach to something (8 bits) and lose their original meaning.

Yulun Du (@yulun_du) 's Twitter Profile Photo

Shaowei from our infra team actually wrote about the decisions we made on the Kimi K2 architecture. zhihu.com/question/19271… I suggest reading it with Kimi K2 as your translator. :)