James Thewlis (@jdthewlis) Twitter Tweets • TwiCopy

James Thewlis

5 months ago

What makes vLLM go brrr? Splitting the KV cache into blocks, enabling efficient batching, better utilisation and higher throughput. I added some hacky visualisation code to the PagedAttention implementation to see it in action for a batch of 4 prompts.

thumb_up_off_alt1

Grant Sanderson

@3blue1brown

5 months ago

The next chapter about transformers is up on YouTube, digging into the attention mechanism: youtu.be/eMlx5fFNoYc The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.

thumb_up_off_alt5,5K

chat_bubble_outline62

repeat774

James Thewlis

5 months ago

Important question: Can Face-to-All N64-ize cats? The answer is yes! (As long as you add a human face in the corner to trick the face detector)

thumb_up_off_alt3

Kevin Patrick Murphy

@sirbayes

5 months ago

From the excellent new computer vision textbook by Isola, Torralba, Freeman.

thumb_up_off_alt204

chat_bubble_outline10

repeat16

James Thewlis

5 months ago

This video is a fascinating dive into all the complexity involved in getting text to render correctly! (Featuring lots of floating point glitches) youtube.com/watch?v=SO83KQ…

thumb_up_off_alt0

James Thewlis

5 months ago

When the pyright checks pass in CI

thumb_up_off_alt1

James Thewlis

5 months ago

omg the LLaVA code checks if the string "mpt" is in the model name to load a completely different model and I used the word "prompt" in my model name and everything broke 🤦‍♂️🤦‍♂️🤦‍♂️

thumb_up_off_alt5

chat_bubble_outline1