Kevin Li (@kevinyli_) Twitter Tweets • TwiCopy

Brandon Trabucco @ ICLR

4 months ago

🌏 Building web-scale agents, and tired of Math and Coding tasks? Come chat with us at ICLR in Singapore. We are presenting InSTA at the DATA-FM workshop in the second Oral session, April 28th 2:30pm. InSTA is the largest environment for training agents, spanning 150k live

thumb_up_off_alt40

chat_bubble_outline0

repeat6

shareShare

Brandon Trabucco @ ICLR

@brandontrabucco

4 months ago

Building LLM Agents? Come to my talk at the #ICLR DATA-FM workshop today at 2:30pm, Hall 4, Section 4. I'll be presenting InSTA, our work building the largest environment for agents on the live internet. arxiv.org/abs/2502.06776 #Agents #LLM

thumb_up_off_alt44

chat_bubble_outline2

repeat12

shareShare

Yutong (Kelly) He

@electronickale

4 months ago

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

thumb_up_off_alt83

chat_bubble_outline2

repeat31

shareShare

Runtian Zhai

@runtianzhai

4 months ago

Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: runtianzhai.com/thesis Paper: arxiv.org/abs/2504.19792 🧵1/12

thumb_up_off_alt161

chat_bubble_outline2

repeat32

shareShare

Aviv Bick

@avivbick

4 months ago

The Transformer–SSM retrieval gap is driven by just a few heads! SSMs lag on tasks like MMLU (multiple-choice) and GSM8K (math) due to in-context retrieval challenges. But here’s the twist: just a handful of heads handle retrieval in both architectures. What we found 👇 1/

thumb_up_off_alt192

chat_bubble_outline5

repeat28

shareShare

Zhengyang Geng

@zhengyanggeng

3 months ago

Excited to share our work with my amazing collaborators, Goodeat, Xingjian Bai, Zico Kolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

Excited to share our work with my amazing collaborators, <a href="/Goodeat258/">Goodeat</a>, <a href="/SimulatedAnneal/">Xingjian Bai</a>, <a href="/zicokolter/">Zico Kolter</a>, and Kaiming.

In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,

thumb_up_off_alt111

chat_bubble_outline4

repeat28

shareShare

Songlin Yang

@songlinyang4

3 months ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Anthony Peng

@realanthonypeng

3 months ago

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving

thumb_up_off_alt75

chat_bubble_outline2

repeat17

shareShare

Lili

@lchen915

3 months ago

One fundamental issue with RL – whether it’s for robots or LLMs – is how hard it is to get rewards. For LLM reasoning, we need ground-truth labels to verify answers. We found that maximizing confidence alone allows LLMs to improve their reasoning with RL!

thumb_up_off_alt129

chat_bubble_outline5

repeat26

shareShare

Fahim Tajwar

@fahimtajwar10

3 months ago

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

thumb_up_off_alt819

chat_bubble_outline20

repeat136

shareShare

Tri Dao

@tri_dao

3 months ago

We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GTA & GLA are steps in this direction: attention variants tailored for inference: high arithmetic intensity (make GPUs go brr even during decoding), easy to

thumb_up_off_alt447

chat_bubble_outline7

repeat50

shareShare

Vaishnavh Nagarajan

@_vaishnavh

3 months ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

thumb_up_off_alt137

chat_bubble_outline1

repeat35

shareShare

Anthony Peng

@realanthonypeng

3 months ago

🚨 Sharing our new #ACL2025NLP main paper! 🎥 Deploying video VLMs at scale? Inference compute is your bottleneck. We study how to optimally allocate inference FLOPs across LLM size, frame count, and visual tokens. 💡 Large-scale training sweeps (~100k A100 hrs) 📊 Parametric

thumb_up_off_alt32

chat_bubble_outline1

repeat6

shareShare

Omar Shaikh

@oshaikh13

3 months ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

thumb_up_off_alt181

chat_bubble_outline12

repeat57

shareShare

Sabri Eyuboglu

@eyuboglusabri

3 months ago

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

thumb_up_off_alt287

chat_bubble_outline12

repeat66

shareShare

Avi Schwarzschild

@a_v_i__s

3 months ago

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time OpenAI working on LLM privacy. UNC Computer Science UNC NLP

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time <a href="/OpenAI/">OpenAI</a> working on LLM privacy.
<a href="/unccs/">UNC Computer Science</a> <a href="/uncnlp/">UNC NLP</a>

thumb_up_off_alt573

chat_bubble_outline46

repeat34

shareShare

YixuanEvenXu

@yixuanevenxu

3 months ago

✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)

$✨ Did you know that NOT using all generated rollouts in GRPO can boost your reasoning LLM? Meet PODS! We down-sample rollouts and train on just a fraction, delivering notable gains over vanilla GRPO. (1/7)$

thumb_up_off_alt135

chat_bubble_outline4

repeat16

shareShare

Ricardo Buitrago

@rbuit_

2 months ago

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

thumb_up_off_alt183

chat_bubble_outline4

repeat32

shareShare

elie

@eliebakouch

2 months ago

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >

thumb_up_off_alt615

chat_bubble_outline73

repeat106

shareShare

Albert Gu

@_albertgu

2 months ago

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

thumb_up_off_alt516

chat_bubble_outline19

repeat72

shareShare