Tian Jin @ ICLR (@tjingrant) 's Twitter Profile
Tian Jin @ ICLR

@tjingrant

PhD student @MIT_CSAIL, previously @IBMResearch, @haverfordedu .

ID: 3078864701

linkhttp://www.tjin.org calendar_today08-03-2015 06:52:01

81 Tweet

334 Takipçi

312 Takip Edilen

Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:

→ LLMs are limited in creativity since they learn to predict the next token

→ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Tianyuan Zhang (@tianyuanzhang99) 's Twitter Profile Photo

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

Ziniu Li @ ICLR2025 (@ziniuli) 's Twitter Profile Photo

Haitham Bou Ammar Same thought! To my knowledge, this optimal value was first documented in "Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning" by Greensmith, Bartlett, and Baxter (JMLR 2004). The interesting finding is that this baseline value can achieve global

Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:

🚨 [New paper alert] Esoteric Language Models (Eso-LMs)

First Diffusion LM to support KV caching w/o compromising parallel generation.

🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥
🚀 65× faster than MDLM
⚡ 4× faster than Block Diffusion

📜 Paper:
Transluce (@transluceai) 's Twitter Profile Photo

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸

We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models.

(Joint work with <a href="/achakravarthy01/">Ayush Chakravarthy</a>, <a href="/ryansehrlich/">Ryan Ehrlich</a>, <a href="/EyubogluSabri/">Sabri Eyuboglu</a>, <a href="/brad19brown/">Bradley Brown</a>, <a href="/jshetaye/">Joseph Shetaye</a>,
Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Two years in the making, we finally have 8 TB of openly licensed data with document-level metadata for authorship attribution, licensing details, links to original copies, and more. Hugely proud of the entire team.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

My sleep scores during recent travel were in the 90s. Now back in SF I am consistently back down to 70s, 80s. I am increasingly convinced that this is due to traffic noise from a nearby road/intersection where I live - every ~10min, a car, truck, bus, or motorcycle with a very

Jiaxin Wen @ICLR2025 (@jiaxinwen22) 's Twitter Profile Photo

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision.

Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.
Jeff Dean (@jeffdean) 's Twitter Profile Photo

Check out the 999 open models that Google has released on Hugging Face: huggingface.co/google (Comparative numbers: 387 for Microsoft, 33 for OpenAI, 0 for Anthropic).

Check out the 999 open models that Google has released on <a href="/huggingface/">Hugging Face</a>:

huggingface.co/google

(Comparative numbers: 387 for Microsoft, 33 for OpenAI, 0 for Anthropic).
Zirui Liu (@ziruirayliu) 's Twitter Profile Photo

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

Jesse Michel (@jessemmichel) 's Twitter Profile Photo

Pohang, Korea has the best stairs. Every step the whole enclosure sways. I'm looking forward to attending PLDI this week and giving a talk on the semantics of singular integrals and their derivatives!

Pohang, Korea has the best stairs. Every step the whole enclosure sways.

I'm looking forward to attending PLDI this week and giving a talk on the semantics of singular integrals and their derivatives!
Jyo Pari (@jyo_pari) 's Twitter Profile Photo

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

We will also give an online talk about Multiverse at ASAP Seminars (asap-seminar.github.io) on June 18th (this Wednesday), 2:00 PM Eastern Time. Please feel free to join us if you are interested! 🧵 12/n

We will also give an online talk about Multiverse at ASAP Seminars (asap-seminar.github.io) on June 18th (this Wednesday), 2:00 PM Eastern Time. Please feel free to join us if you are interested!

🧵 12/n
Tian Jin @ ICLR (@tjingrant) 's Twitter Profile Photo

Check out Multiverse -- this amazing model knows when and how to use map-reduce to solve challenging reasoning problems at inference time!

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

We're incredibly excited to announce our latest open science community-led initiative, Papers in the Park! 🌳 This is a great opportunity for those in Toronto, Canada to meet up and discuss a pre-selected research paper while enjoying the Summer weather!

We're incredibly excited to announce our latest open science community-led initiative, Papers in the Park! 🌳

This is a great opportunity for those in Toronto, Canada to meet up and discuss a pre-selected research paper while enjoying the Summer weather!