Tian Jin @ ICLR (@tjingrant) Twitter Tweets • TwiCopy

Vaishnavh Nagarajan

4 months ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

thumb_up_off_alt137

chat_bubble_outline1

repeat35

shareShare

Tianyuan Zhang

@tianyuanzhang99

4 months ago

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

thumb_up_off_alt390

chat_bubble_outline5

repeat74

shareShare

Ziniu Li @ ICLR2025

@ziniuli

4 months ago

Haitham Bou Ammar Same thought! To my knowledge, this optimal value was first documented in "Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning" by Greensmith, Bartlett, and Baxter (JMLR 2004). The interesting finding is that this baseline value can achieve global

thumb_up_off_alt13

chat_bubble_outline2

repeat1

shareShare

Subham Sahoo

@ssahoo_

4 months ago

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:

thumb_up_off_alt242

chat_bubble_outline10

repeat36

shareShare

Tian Jin @ ICLR

@tjingrant

4 months ago

Cursor inserting emoji into your code?

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Tian Jin @ ICLR

@tjingrant

4 months ago

Combining sparsity with spec dec makes a lot of sense especially w/ the Red Hat AI (formerly Neural Magic) and Cerebras stack!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Transluce

@transluceai

4 months ago

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

thumb_up_off_alt150

chat_bubble_outline5

repeat35

shareShare

Jordan Juravsky

@jordanjuravsky

4 months ago

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

thumb_up_off_alt168

chat_bubble_outline3

repeat38

shareShare

Han Guo

@hanguo97

4 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Stella Biderman

@blancheminerva

4 months ago

Two years in the making, we finally have 8 TB of openly licensed data with document-level metadata for authorship attribution, licensing details, links to original copies, and more. Hugely proud of the entire team.

thumb_up_off_alt551

chat_bubble_outline18

repeat64

shareShare

Andrej Karpathy

@karpathy

4 months ago

My sleep scores during recent travel were in the 90s. Now back in SF I am consistently back down to 70s, 80s. I am increasingly convinced that this is due to traffic noise from a nearby road/intersection where I live - every ~10min, a car, truck, bus, or motorcycle with a very

thumb_up_off_alt11,11K

chat_bubble_outline1,1K

repeat761

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

4 months ago

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat153

shareShare

Jeff Dean

@jeffdean

4 months ago

Check out the 999 open models that Google has released on Hugging Face: huggingface.co/google (Comparative numbers: 387 for Microsoft, 33 for OpenAI, 0 for Anthropic).

Check out the 999 open models that Google has released on <a href="/huggingface/">Hugging Face</a>:

huggingface.co/google

(Comparative numbers: 387 for Microsoft, 33 for OpenAI, 0 for Anthropic).

thumb_up_off_alt769

chat_bubble_outline36

repeat101

shareShare

Zirui Liu

@ziruirayliu

4 months ago

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

thumb_up_off_alt93

chat_bubble_outline3

repeat21

shareShare

Tian Jin @ ICLR

@tjingrant

4 months ago

Having worked on ONNX, I’ve learned that true framework interoperability takes heroic effort.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jesse Michel

@jessemmichel

4 months ago

Pohang, Korea has the best stairs. Every step the whole enclosure sways. I'm looking forward to attending PLDI this week and giving a talk on the semantics of singular integrals and their derivatives!

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Jyo Pari

@jyo_pari

4 months ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

thumb_up_off_alt3,3K

chat_bubble_outline124

repeat514

shareShare

Infini-AI-Lab

@infiniailab

4 months ago

We will also give an online talk about Multiverse at ASAP Seminars (asap-seminar.github.io) on June 18th (this Wednesday), 2:00 PM Eastern Time. Please feel free to join us if you are interested! 🧵 12/n

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Tian Jin @ ICLR

@tjingrant

4 months ago

Check out Multiverse -- this amazing model knows when and how to use map-reduce to solve challenging reasoning problems at inference time!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Cohere Labs

@cohere_labs

4 months ago

We're incredibly excited to announce our latest open science community-led initiative, Papers in the Park! 🌳 This is a great opportunity for those in Toronto, Canada to meet up and discuss a pre-selected research paper while enjoying the Summer weather!

thumb_up_off_alt48

chat_bubble_outline4

repeat4

shareShare