James Thewlis (@jdthewlis) 's Twitter Profile
James Thewlis

@jdthewlis

Chief Scientist & Co-Founder @UnitaryAI | Former PhD @Oxford_VGG, MEng @imperialcollege, research w/ @FacebookAI | Computer Vision & AI 🖼🤖

ID: 905126113277562880

linkhttps://jamesthewlis.com calendar_today05-09-2017 17:50:56

96 Tweet

471 Followers

314 Following

Nando de Freitas (@nandodf) 's Twitter Profile Photo

There appears to be a mismatch between publishing criteria in AI conferences and "what actually works". It is easy to publish new mathematical constructs (e.g. new models, new layers, new modules, new losses), but as Apple's MM1 paper concludes: 1. Encoder Lesson: Image

There appears to be a mismatch between publishing criteria in AI conferences and "what actually works". It is easy to publish new mathematical constructs (e.g. new models, new layers, new modules, new losses), but as Apple's MM1 paper concludes:

1. Encoder Lesson: Image
James Thewlis (@jdthewlis) 's Twitter Profile Photo

What makes vLLM go brrr? Splitting the KV cache into blocks, enabling efficient batching, better utilisation and higher throughput. I added some hacky visualisation code to the PagedAttention implementation to see it in action for a batch of 4 prompts.

Grant Sanderson (@3blue1brown) 's Twitter Profile Photo

The next chapter about transformers is up on YouTube, digging into the attention mechanism: youtu.be/eMlx5fFNoYc The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.

James Thewlis (@jdthewlis) 's Twitter Profile Photo

Important question: Can Face-to-All N64-ize cats? The answer is yes! (As long as you add a human face in the corner to trick the face detector)

James Thewlis (@jdthewlis) 's Twitter Profile Photo

This video is a fascinating dive into all the complexity involved in getting text to render correctly! (Featuring lots of floating point glitches) youtube.com/watch?v=SO83KQ…

James Thewlis (@jdthewlis) 's Twitter Profile Photo

omg the LLaVA code checks if the string "mpt" is in the model name to load a completely different model and I used the word "prompt" in my model name and everything broke 🤦‍♂️🤦‍♂️🤦‍♂️

omg the LLaVA code checks if the string "mpt" is in the model name to load a completely different model and I used the word "prompt" in my model name and everything broke 🤦‍♂️🤦‍♂️🤦‍♂️
Salman Khan (@khansalmanh) 's Twitter Profile Photo

📈Exciting updates to our recent effort to extend LLaMA3 and Phi3 for *visual* understanding. Enjoy! 💻Online demo: bengal-eminent-wasp.ngrok-free.app 📓 Chat in Google Colab: colab.research.google.com/drive/10Z2HaY5… 🚀LoRA, fully FT and S2 FT models added! github.com/mbzuai-oryx/LL…

📈Exciting updates to our recent effort to extend LLaMA3 and Phi3 for *visual* understanding. Enjoy!
💻Online demo: bengal-eminent-wasp.ngrok-free.app
📓 Chat in Google Colab: colab.research.google.com/drive/10Z2HaY5…
🚀LoRA, fully FT and S2 FT models added! github.com/mbzuai-oryx/LL…