Michael Wise
@weezilla
Code @ CRANT
❤️ Science, SpaceX, Tesla
ID: 2929870486
18-12-2014 16:35:31
22,22K Tweet
1,1K Followers
1,1K Following
Chuck Cook Here is how it works since I am working at this field. - 3x model size scaling means 3x more RAM needed for model, if not doing additional quantization or distillation, which I doubt they do at the early stage. - 3x model context length scaling means slight more RAM will be