Ishaan (@auto_grad_) Twitter Tweets • TwiCopy

today i walked by a group of happy children playing football, laughing around, teasing each other, smiling at me while I took a few passes. they wanted to be grow up and work like me i wanted to turn back the time and play like them irony of life.

thumb_up_off_alt20

chat_bubble_outline1

repeat0

shareShare

wh

@nrehiew_

4 months ago

Please be good Please not be benchmaxxed Please be Claude Code compatible Please be cheap Please have some soul Please be good at multiturn Please have low hallucination rates

thumb_up_off_alt1,1K

chat_bubble_outline76

repeat39

shareShare

Ishaan

@auto_grad_

4 months ago

i don't even trust llm benchmarking anymore, its all bluffs as of now. model releases show the benchmarks in which they're good at, abstracting all other info

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Guangxuan Xiao

@guangxuan_xiao

4 months ago

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

thumb_up_off_alt895

chat_bubble_outline17

repeat114

shareShare

Ishaan

@auto_grad_

4 months ago

only upwards from here Atomicwork!

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Jinjie Ni @ ICLR'25 🇸🇬

@nijinjie

4 months ago

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat187

shareShare

Ishaan

@auto_grad_

4 months ago

the drop i have been waiting for so long for, this opens up a window probably for a fresh breeze in pre-training immense respect for tokenbender to pull this off and not to forget Chinmay Kak's contris too!

thumb_up_off_alt20

chat_bubble_outline1

repeat1

shareShare

Ishaan

@auto_grad_

4 months ago

the most basic intuition behind ce loss is minimizing the distance between probability of the "ground truth" next word vs the "predicted" next word given the current word we have. yes, a direct derivation from KLD

thumb_up_off_alt11

chat_bubble_outline2

repeat0

shareShare

Dimitris Papailiopoulos

@dimitrispapail

4 months ago

your RL is secretly doing SFT and your SFT is secretly doing RL. It's all trying to maximize the probability of memorizing data.

thumb_up_off_alt241

chat_bubble_outline12

repeat13

shareShare

Skywork

@skywork_ai

4 months ago

Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models. But... it wasn't open-sourced. Today, Matrix-Game 2.0 changed the game. 🚀 25FPS. Minutes-long

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat342

shareShare

Ishaan

@auto_grad_

4 months ago

cheif neuroscientist

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Ishaan

@auto_grad_

4 months ago

seeing this tweet i just realized, i need to work hella more to do all these before 24. be it in a research lab or independently, i need to check everything before i turn 24 (i have 5 checked already tho). THIS IS MY GOAL FOR THE NEXT YEAR HAHA

thumb_up_off_alt13

chat_bubble_outline2

repeat1

shareShare

Ishaan

@auto_grad_

4 months ago

pros of living with a blockchain flatmate: you get to hear about blockchain daily cons of living with a blockchain flatmate: you get to hear about blockchain DAILY

thumb_up_off_alt17

chat_bubble_outline2

repeat0

shareShare

Ishaan

@auto_grad_

4 months ago

"what do you recommend for debugging pytorch issues?" - prayers to the lord himself

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

Ishaan

@auto_grad_

4 months ago

this is such a powerful post! (also not to mention but the entire xAI team is in the comments xd)

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Ishaan

@auto_grad_

4 months ago

thanks Aman for this cute gift

thanks <a href="/r4plh/">Aman</a> for this cute gift

thumb_up_off_alt74

chat_bubble_outline2

repeat0

shareShare

Ishaan

@auto_grad_

4 months ago

cause you not just learn pytorch, you learn > how to write functions and code > c++/ python in a more intuitive manner > how to actually debug > how timid you actually are infront of those crazy minds at pytorch

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare