Brandon Amos (@brandondamos) Twitter Tweets • TwiCopy

Thomas Lew

9 months ago

I'm excited to share new optimality conditions for nonlinear stochastic optimal control, and the first indirect shooting method for solving these problems! 📖 arxiv.org/abs/2502.06726 💡 How? Using rough path theory ⬇️

thumb_up_off_alt19

chat_bubble_outline2

repeat3

shareShare

Ruilong Li

@ruilong_li

6 months ago

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

thumb_up_off_alt410

chat_bubble_outline7

repeat75

shareShare

Jiaxin Shi

@thjashin

6 months ago

Autoregressive models are too restrictive by forcing a fixed generation order, while masked diffusion is wasteful as it fits all possible orders. Can our model dynamically decide the next position to generate based on context? Learn more in our ICML paper arxiv.org/abs/2503.05979

thumb_up_off_alt434

chat_bubble_outline8

repeat55

shareShare

Grigory Bartosh

@grigorybartosh

6 months ago

📢Presenting SDE Matching🔥🔥🔥 🚀We extend diffusion models to construct a simulation-free framework for training Latent SDEs. It enables sampling from the exact posterior process marginals without any numerical simulations. 📜: arxiv.org/abs/2502.02472 🧵1/8

thumb_up_off_alt812

chat_bubble_outline3

repeat142

shareShare

Alexander Wei

@alexwei_

6 months ago

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

thumb_up_off_alt6,6K

chat_bubble_outline361

repeat1,1K

shareShare

Laker Newhouse

@lakernewhouse

6 months ago

[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

thumb_up_off_alt562

chat_bubble_outline13

repeat77

shareShare

cider

@jeffreycider

6 months ago

optimization theorem: "assume a lipschitz constant L..." the lipschitz constant:

thumb_up_off_alt523

chat_bubble_outline10

repeat31

shareShare

Mihir Prabhudesai

@mihirp98

5 months ago

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

thumb_up_off_alt973

chat_bubble_outline122

repeat171

shareShare

Qinyuan Ye (👀Jobs)

@qinyuan_ye

5 months ago

1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵

thumb_up_off_alt103

chat_bubble_outline4

repeat15

shareShare

Michael Black

@michael_j_black

5 months ago

Here's how my recent papers & reviews are going: * To solve a vision problem today, the sensible thing is to leverage a pre-trained VLM or video diffusion model. Such models implicitly represent a tremendous amount about the visual world that we can exploit. * Figure out how to

thumb_up_off_alt454

chat_bubble_outline5

repeat53

shareShare

Anne Ouyang

@anneouyang

5 months ago

KernelBench v0.1 is out, featuring: - A guideline on analyzing the validity of results and ruling out physically impossible performance claims. - Support for randomized testing beyond normal distributions. - Fixed problem sizes and improved numerics

thumb_up_off_alt186

chat_bubble_outline8

repeat31

shareShare

机器之心 JIQIZHIXIN

@synced_global

5 months ago

ByteDance is exploring diffusion LLMs too! 👀 Seed Diffusion Preview: a blazing-fast LLM for code, built on discrete-state diffusion. With 2,146 tokens/sec inference on H20 GPUs, it outpaces Mercury & Gemini Diffusion, while matching their performance on standard code

thumb_up_off_alt535

chat_bubble_outline5

repeat77

shareShare

clem 🤗

@clementdelangue

5 months ago

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0). This is why we're releasing the Ultra-Scale Playbook. 200 pages to master: - 5D parallelism (DP, TP, PP, EP,

thumb_up_off_alt2,2K

chat_bubble_outline52

repeat279

shareShare

Leo Zhang

@leoeleoleo1

5 months ago

Wrote up some notes providing an introduction to discrete diffusion models, going into the theory of time-inhomogeneous CTMCs via generators/time-evolution systems. What motivated me was the sheer difficulty of finding a useful reference which laid out the theory (e.g.

thumb_up_off_alt105

chat_bubble_outline5

repeat21

shareShare

Brandon Amos

@brandondamos

5 months ago

also can anyone — pls — make the LMs stop generating trailing whitespace in code — ty

thumb_up_off_alt28

chat_bubble_outline1

repeat1

shareShare

Jack Parker-Holder

@jparkerholder

5 months ago

Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time

thumb_up_off_alt4,4K

chat_bubble_outline217

repeat456

shareShare

Tim Rocktäschel

@_rockt

5 months ago

Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by Jack Parker-Holder and Shlomi Fruchter. Below some interactive worlds and capabilities that were highlights for me

thumb_up_off_alt1,1K

chat_bubble_outline36

repeat151

shareShare

OpenAI

@openai

5 months ago

Our open models are here. Both of them. openai.com/open-models

thumb_up_off_alt18,18K

chat_bubble_outline1,1K

repeat3,3K

shareShare

Feng Yao

@fengyao1909

5 months ago

Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL? ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜𝐫𝐞𝐭𝐥𝐲 𝐭𝐮𝐫𝐧𝐬 𝐲𝐨𝐮𝐫 𝐑𝐋 𝐢𝐧𝐭𝐨 𝐨𝐟𝐟-𝐩𝐨𝐥𝐢𝐜𝐲 — even if they share the same weights! 📉 Blog:

thumb_up_off_alt461

chat_bubble_outline5

repeat69

shareShare