Nikhil Barhate (@nikhilbarhate99) Twitter Tweets • TwiCopy

Amandeep Kumar

3 months ago

🚀 Unlocking Standard Diffusion Transformers on Representation Encoders Why do standard DiTs fail to converge on high-dimensional features like DINOv2? 📉 We found the answer isn't just "more parameters"—it's Geometry. Introducing Riemannian Flow Matching with Jacobi

thumb_up_off_alt668

chat_bubble_outline18

repeat106

shareShare

Yibo Yang

@yiboyang

3 months ago

We've known that diffusion models are theoretically very good lossy data compressors , but how can we actually implement this idea in practice? I discuss this and related topics in a new review article on diffusion-based generative compression arxiv.org/abs/2601.18932

thumb_up_off_alt147

chat_bubble_outline0

repeat16

shareShare

Tyler Griggs

@tyler_griggs_

3 months ago

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵

thumb_up_off_alt160

chat_bubble_outline3

repeat40

shareShare

Oscar Davis

@osclsd

3 months ago

You like discrete diffusion, but it's too slow? 🥀 You like test-time inference, but it's for continuous methods? 😩 We fixed it. Introducing Categorical Flow Maps: continuously sample discrete data in a single step 🚀💫 How? 🧵⬇️ 💪 Co-led with Floor Eijkelboom, Daan Roos

thumb_up_off_alt604

chat_bubble_outline9

repeat79

shareShare

Alan Baade

@baadealan

3 months ago

What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ Fei-Fei Li+... 1/n

thumb_up_off_alt332

chat_bubble_outline4

repeat48

shareShare

Charlie Ruan

@charlie_ruan

3 months ago

Releasing the official SkyRL + Harbor integration: a standardized way to train terminal-use agents with RL. From the creators of Terminal-Bench, Harbor is a widely adopted framework for evaluating terminal-use agents on any task expressible as a Dockerfile + instruction + test

thumb_up_off_alt230

chat_bubble_outline9

repeat46

shareShare

Jason Ramapuram

@jramapuram

2 months ago

Autoregressive models dominate, but what if we treat multimodal generation as discrete order agnostic iterative refinement? Excited to share our systematic study on the design space of Tri-Modal Masked Diffusion Models (MDMs). We pre-trained the first Tri-Modal MDM from scratch

thumb_up_off_alt99

chat_bubble_outline2

repeat21

shareShare

Peter Tong

@tongpetersb

2 months ago

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat217

shareShare

William Shen

@shenbokui

2 months ago

Excited to introduce Uni-1, our new multimodal model that *unifies* understanding and generation. TLDR: a team of ~15 researchers is going pound-for-pound with nano banana and gpt image 🧵

thumb_up_off_alt537

chat_bubble_outline21

repeat64

shareShare

Ian Osband

@ianosband

2 months ago

Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient:

thumb_up_off_alt436

chat_bubble_outline16

repeat44

shareShare

Olga Zaghen @ ICLR 🇸🇬

@olgazaghen

a month ago

🔮 Working on ML on curved manifolds? Don't miss out on Jacobi Fields! 🔮 I wrote a quick, highly visual and hopefully accessible introduction to the topic: "Jacobi Fields in Machine Learning" 🤠 Check it out here: olgatticus.github.io/blog/jacobi-fi…!

thumb_up_off_alt451

chat_bubble_outline12

repeat65

shareShare

chuyi shang

@chuyishang

a month ago

Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training! If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post:

thumb_up_off_alt603

chat_bubble_outline9

repeat66

shareShare

Max Fu

@letian_fu

a month ago

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve

thumb_up_off_alt561

chat_bubble_outline15

repeat114

shareShare

Hojoon Lee

@hojoon_ai

a month ago

We scaled off-policy RL to sim-to-real. To our knowledge, FlashSAC is the fastest and most performant RL algorithm across IsaacLab, MuJoCo Playground, and many more, all with a single set of hyperparameters. Project page: holiday-robot.github.io/FlashSAC Paper: arxiv.org/pdf/2604.04539

thumb_up_off_alt269

chat_bubble_outline16

repeat41

shareShare

Anirudh Goyal

@anirudhg9119

a month ago

Reasoning doesn’t have to mean longer chains of thought: PDR = draft in parallel → distill into a compact workspace → refine, and shift the Pareto frontier. arxiv.org/abs/2510.01123

thumb_up_off_alt297

chat_bubble_outline4

repeat31

shareShare

Mingchen Zhuge (🇸🇬 ICLR)

@mingchenzhuge

a month ago

🫱 Introducing 𝐍𝐞𝐮𝐫𝐚𝐥 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫s: 𝐰𝐡𝐚𝐭 𝐢𝐟 𝐀𝐈 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐮𝐬𝐞 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫𝐬 𝐛𝐞𝐭𝐭𝐞𝐫, 𝐛𝐮𝐭 𝐛𝐞𝐠𝐢𝐧𝐬 𝐭𝐨 𝐛𝐞𝐜𝐨𝐦𝐞 𝐭𝐡𝐞 𝐫𝐮𝐧𝐧𝐢𝐧𝐠 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫 𝐢𝐭𝐬𝐞𝐥𝐟? Beyond today's conventional computers, agents, and

thumb_up_off_alt571

chat_bubble_outline27

repeat115

shareShare

Yu Lei

@_outofmemory_

16 days ago

🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed

thumb_up_off_alt375

chat_bubble_outline5

repeat65

shareShare

Chongyi Zheng

@chongyiz1

14 days ago

1/ Reinforcement learning is usually framed as maximizing rewards. But can we cast it as reaching the right goals? New blog on bridging RL, goal-conditioned RL, and stochastic shortest path: iclr-blogposts.github.io/2026/blog/2026… Also #ICLR2026 Poster: Thu 10:30 AM–1:00 PM, P4 #4611. 🧵⬇️

thumb_up_off_alt144

chat_bubble_outline2

repeat26

shareShare

Taco Cohen

@tacocohen

6 days ago

Apparently it is not well known and not easy to see that this "simple masked loss" is EXACTLY gradient-equivalent to PPO-Clip (at least for one way of computing the mask). Here's how to see this: The standard token-level PPO-Clip objective is the rather unintuitive J_t =

thumb_up_off_alt322

chat_bubble_outline9

repeat18

shareShare