Pramod Goyal (@goyal__pramod) 's Twitter Profile
Pramod Goyal

@goyal__pramod

Trying to change the world one line at a time

AI engineer @joindimension
Founder @HacktogetherDev

ID: 1728292834271236096

linkhttps://goyalpramod.github.io/ calendar_today25-11-2023 06:01:58

1,1K Tweet

3,3K Followers

157 Following

Daniel Han (@danielhanchen) 's Twitter Profile Photo

OpenAI's OSS model possible breakdown: 1. 120B MoE 5B active + 20B text only 2. Trained with Float4 maybe Blackwell chips 3. SwiGLU clip (-7,7) like ReLU6 4. 128K context via YaRN from 4K 5. Sliding window 128 + attention sinks 6. Llama/Mixtral arch + biases Details: 1. 120B MoE

OpenAI's OSS model possible breakdown:
1. 120B MoE 5B active + 20B text only
2. Trained with Float4 maybe Blackwell chips
3. SwiGLU clip (-7,7) like ReLU6
4. 128K context via YaRN from 4K
5. Sliding window 128 + attention sinks
6. Llama/Mixtral arch + biases

Details:
1. 120B MoE
Jürgen Schmidhuber (@schmidhuberai) 's Twitter Profile Photo

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than

Who invented convolutional neural networks (CNNs)? 

1969: Fukushima had CNN-relevant ReLUs [2].

1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than
Raja Koduri (@rajaxg) 's Twitter Profile Photo

Been waiting for this day... Oxmiq Labs Inc., the all-new GPU software and IP startup emerges from stealth. We assembled a world-class team of GPU and AI architects with over 500 years of combined experiences. OXMIQ™'s licensable IP rearchitects the GPU from the ground up.

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁

The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). 

Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁
Graham Neubig (@gneubig) 's Twitter Profile Photo

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

Pramod Goyal (@goyal__pramod) 's Twitter Profile Photo

Who would have guessed building an interactive web book will be so tough But it certainly is rewarding. I am learning a lot of web dev that I didn't know. I will later release a blog on building this too, so ML ppl who wish to learn web dev have a place to get started.

Who would have guessed building an interactive web book will be so tough 

But it certainly is rewarding. I am learning a lot of web dev that I didn't know. 

I will later release a blog on building this too, so ML ppl who wish to learn web dev have a place to get started.
Pramod Goyal (@goyal__pramod) 's Twitter Profile Photo

Python concepts to make you 10x more efficient zip static & class methods mixin list & dict comprehension collections library pydantic decorators & generators enumerate f-string *args, **kwargs, *_list partials & functools profiling

Pramod Goyal (@goyal__pramod) 's Twitter Profile Photo

I will be eternally grateful to anyone who can tell me what things I should explore to build such crazy, amazing blogs. Currently on my list * OpenGL and shaders * Three JS * P5 JS * D3 JS P.S. I already spent time learning TS, Next, React. Still not great, but getting there

I will be eternally grateful to anyone who can tell me what things I should explore to build such crazy, amazing blogs. Currently on my list

* OpenGL and shaders 
* Three JS 
* P5 JS
* D3 JS 

P.S. I already spent time learning TS, Next, React. Still not great, but getting there
Cephandrius (@proteinengine) 's Twitter Profile Photo

Pramod Goyal okay, fun :) distill has an article about writing distill articles: distill.pub/guide/. they also open-source the code for their articles on their github at github.com/distillpub. they're big on adobe illustrator, since you can animate illustrator vectors with js.

Pramod Goyal (@goyal__pramod) 's Twitter Profile Photo

Okay guys, 4th blog onwards. Stuff is going to get a 10x level up. (I have already made significant progress in the other 3 I am writing, I would like to finish them first, then start building something new with this knowledge)

Okay guys, 4th blog onwards. Stuff is going to get a 10x level up. 

(I have already made significant progress in the other 3 I am writing, I would like to finish them first, then start building something new with this knowledge)