Ishaan (@auto_grad_) 's Twitter Profile
Ishaan

@auto_grad_

intern @sarvamai // ug @iitroorkee

ID: 1458643268795654145

linkhttps://darky.bearblog.dev/ calendar_today11-11-2021 03:50:48

4,4K Tweet

1,1K Followers

501 Following

Ishaan (@auto_grad_) 's Twitter Profile Photo

today i walked by a group of happy children playing football, laughing around, teasing each other, smiling at me while I took a few passes. they wanted to be grow up and work like me i wanted to turn back the time and play like them irony of life.

wh (@nrehiew_) 's Twitter Profile Photo

Please be good Please not be benchmaxxed Please be Claude Code compatible Please be cheap Please have some soul Please be good at multiturn Please have low hallucination rates

Ishaan (@auto_grad_) 's Twitter Profile Photo

i don't even trust llm benchmarking anymore, its all bluffs as of now. model releases show the benchmarks in which they're good at, abstracting all other info

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.

For those interested in the details:
hanlab.mit.edu/blog/streaming…
Jinjie Ni @ ICLR'25 🇸🇬 (@nijinjie) 's Twitter Profile Photo

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens

Token crisis: solved. ✅

We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs.

Findings:
>  DLMs beat AR when tokens are limited, with >3× data potential.
>  A 1B DLM trained on just 1B tokens
Ishaan (@auto_grad_) 's Twitter Profile Photo

the drop i have been waiting for so long for, this opens up a window probably for a fresh breeze in pre-training immense respect for tokenbender to pull this off and not to forget Chinmay Kak's contris too!

Ishaan (@auto_grad_) 's Twitter Profile Photo

the most basic intuition behind ce loss is minimizing the distance between probability of the "ground truth" next word vs the "predicted" next word given the current word we have. yes, a direct derivation from KLD

Skywork (@skywork_ai) 's Twitter Profile Photo

Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models. But... it wasn't open-sourced. Today, Matrix-Game 2.0 changed the game. 🚀 25FPS. Minutes-long

Ishaan (@auto_grad_) 's Twitter Profile Photo

seeing this tweet i just realized, i need to work hella more to do all these before 24. be it in a research lab or independently, i need to check everything before i turn 24 (i have 5 checked already tho). THIS IS MY GOAL FOR THE NEXT YEAR HAHA

Ishaan (@auto_grad_) 's Twitter Profile Photo

pros of living with a blockchain flatmate: you get to hear about blockchain daily cons of living with a blockchain flatmate: you get to hear about blockchain DAILY

Ishaan (@auto_grad_) 's Twitter Profile Photo

cause you not just learn pytorch, you learn > how to write functions and code > c++/ python in a more intuitive manner > how to actually debug > how timid you actually are infront of those crazy minds at pytorch