Sushant Kumar (@sushantkumar_23) Twitter Tweets • TwiCopy

Sushant Kumar

@sushantkumar_23

+ Follow

Director of AI, SproutsAI
Prev: Head of AI, SquareYards, Co-founder Azuro, Bank of America, IIT Bombay 2014

ID: 27609592

linkhttps://sushant-kumar.com calendar_today30-03-2009 09:21:06

2,2K Tweet

1,1K Followers

2,2K Following

toly 🇺🇸

@aeyakovenko

4 years ago

Let t there be code reuse!

thumb_up_off_alt42

chat_bubble_outline1

repeat5

shareShare

Hi Folks! Our newest episode of Portfolio Shorts is now live on YT! youtu.be/rqiwqjpE6RA Listen to Pawan Gupta talk about Fashinza which is a new-age supply chain and product development platform for the fashion industry🚀

thumb_up_off_alt6

chat_bubble_outline1

repeat4

shareShare

Sushant Kumar

@sushantkumar_23

a year ago

Major AI labs are hitting the wall with larger models. True AI needs Continual Learning, Agency, and Adaptive behavior—not just bigger models and more data. My thoughts that Path to Prize (SuperIntelligence) lies beyond the comfortable confines of LLMs sushant-kumar.com/blog/path-to-p…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

a year ago

Pre-training SLMs seems to be accessible way into AI research. 1 day of H100-80GB pretrains ~1B SLM on roughly about 15B gpt2 tokens if properly optimised (was able to get upto ~200k tok/sec throughput) anyone know of SLMs (sub 1B) that has 50 MMLU (or in other words useful)?

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

a year ago

One of best experiments to get hands dirty for building foundational models. ~30M model so cheap enough to pre-train. Added RoPE to GPT2 architecture. Raised PR! Pre-trained to a validation loss 1.224 in less than 1 hour H100 (40GB)! Great work, Om Alve! Toy MoE next?

thumb_up_off_alt5

chat_bubble_outline2

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

a year ago

Next week will be 🔥

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

a year ago

Just got access to Claude Code! This one is quite good! Thanks Anthropic team!

Just got access to Claude Code! This one is quite good!

Thanks <a href="/AnthropicAI/">Anthropic</a> team!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Richard Sutton

@richardssutton

a year ago

awards.acm.org/about/2024-tur… Machines that learn from experience were explored by Alan Turing almost eighty years ago, which makes it particularly gratifying and humbling to receive an award in his name for reviving this essential but still nascent idea.

thumb_up_off_alt2,2K

chat_bubble_outline156

repeat351

shareShare

Sushant Kumar

@sushantkumar_23

9 months ago

This lecture is a masterpiece from Richard Sutton that's moving the conversation forward on what the future with AI could look like. You can literally feel the clarity that comes when you have spent half-a-century thinking deeply about intelligence. youtube.com/watch?v=FLOL2f…

This lecture is a masterpiece from <a href="/RichardSSutton/">Richard Sutton</a> that's moving the conversation forward on what the future with AI could look like.

You can literally feel the clarity that comes when you have spent half-a-century thinking deeply about intelligence.

youtube.com/watch?v=FLOL2f…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

9 months ago

Looking forward to this … 🚀

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Sushant Kumar

@sushantkumar_23

9 months ago

For me, GPT-OSS was the much-awaited OpenAI architecture reveal since their detailed GPT-2 architecture! Some key observations: ✅ MoE GPT-2-style Transformer (36 / 24 layers) ✅ 128 / 32 experts with top-4 routing ⇒ only 5.1 B / 3.6 B active params ✅ RMSNorm ✅ Grouped Query