Sushant Kumar (@sushantkumar_23) 's Twitter Profile
Sushant Kumar

@sushantkumar_23

Director of AI, SproutsAI
Prev: Head of AI, SquareYards, Co-founder Azuro, Bank of America, IIT Bombay 2014

ID: 27609592

linkhttps://sushant-kumar.com calendar_today30-03-2009 09:21:06

2,2K Tweet

1,1K Followers

2,2K Following

First Cheque (@firstcheque) 's Twitter Profile Photo

Hi Folks! Our newest episode of Portfolio Shorts is now live on YT! youtu.be/rqiwqjpE6RA Listen to Pawan Gupta talk about Fashinza which is a new-age supply chain and product development platform for the fashion industryπŸš€

Sushant Kumar (@sushantkumar_23) 's Twitter Profile Photo

Major AI labs are hitting the wall with larger models. True AI needs Continual Learning, Agency, and Adaptive behaviorβ€”not just bigger models and more data. My thoughts that Path to Prize (SuperIntelligence) lies beyond the comfortable confines of LLMs sushant-kumar.com/blog/path-to-p…

Sushant Kumar (@sushantkumar_23) 's Twitter Profile Photo

Pre-training SLMs seems to be accessible way into AI research. 1 day of H100-80GB pretrains ~1B SLM on roughly about 15B gpt2 tokens if properly optimised (was able to get upto ~200k tok/sec throughput) anyone know of SLMs (sub 1B) that has 50 MMLU (or in other words useful)?

Pre-training SLMs seems to be accessible way into AI research.

1 day of H100-80GB pretrains ~1B SLM on roughly about 15B gpt2 tokens if properly optimised (was able to get upto ~200k tok/sec throughput)

anyone know of SLMs (sub 1B) that has 50 MMLU (or in other words useful)?
Sushant Kumar (@sushantkumar_23) 's Twitter Profile Photo

One of best experiments to get hands dirty for building foundational models. ~30M model so cheap enough to pre-train. Added RoPE to GPT2 architecture. Raised PR! Pre-trained to a validation loss 1.224 in less than 1 hour H100 (40GB)! Great work, Om Alve! Toy MoE next?

One of best experiments to get hands dirty for building foundational models. ~30M model so cheap enough to pre-train.

Added RoPE to GPT2 architecture. Raised PR! Pre-trained to a validation loss 1.224 in less than 1 hour H100 (40GB)!

Great work, <a href="/alve_om/">Om Alve</a>! Toy MoE next?
Richard Sutton (@richardssutton) 's Twitter Profile Photo

awards.acm.org/about/2024-tur… Machines that learn from experience were explored by Alan Turing almost eighty years ago, which makes it particularly gratifying and humbling to receive an award in his name for reviving this essential but still nascent idea.

Sushant Kumar (@sushantkumar_23) 's Twitter Profile Photo

This lecture is a masterpiece from Richard Sutton that's moving the conversation forward on what the future with AI could look like. You can literally feel the clarity that comes when you have spent half-a-century thinking deeply about intelligence. youtube.com/watch?v=FLOL2f…

This lecture is a masterpiece from <a href="/RichardSSutton/">Richard Sutton</a> that's moving the conversation forward on what the future with AI could look like.

You can literally feel the clarity that comes when you have spent half-a-century thinking deeply about intelligence.

youtube.com/watch?v=FLOL2f…
Sushant Kumar (@sushantkumar_23) 's Twitter Profile Photo

For me, GPT-OSS was the much-awaited OpenAI architecture reveal since their detailed GPT-2 architecture! Some key observations: βœ… MoE GPT-2-style Transformer (36 / 24 layers) βœ… 128 / 32 experts with top-4 routing β‡’ only 5.1 B / 3.6 B active params βœ… RMSNorm βœ… Grouped Query

For me, GPT-OSS was the much-awaited OpenAI architecture reveal since their detailed GPT-2 architecture!

Some key observations:

βœ… MoE GPT-2-style Transformer (36 / 24 layers)
βœ… 128 / 32 experts with top-4 routing β‡’ only 5.1 B / 3.6 B active params
βœ… RMSNorm
βœ… Grouped Query