Luke Metz (@luke_metz) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

New post: bringing LLM applications to production! 1. Challenges of LLM engineering & the solutions that I’ve seen 2. How to compose multiple tasks and incorporate tools (e.g. SQL executor, bash, web browsers, third-party APIs) 3. Promising use cases huyenchip.com/2023/04/11/llm…

thumb_up_off_alt832

chat_bubble_outline24

repeat222

shareShare

Chip Huyen

@chipro

3 years ago

New post: RLHF - Reinforcement Learning from Human Feedback Discussing 3 phases of ChatGPT development, where RLHF fits in, how RLHF works, hypotheses on why it works, and relationship between RLHF and hallucination. huyenchip.com/2023/05/02/rlh…

thumb_up_off_alt940

chat_bubble_outline18

repeat215

shareShare

Ishaan Gulrajani

@__ishaan

2 years ago

New paper with Tatsunori Hashimoto! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619 Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵

New paper with <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a>! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619

Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵

thumb_up_off_alt243

chat_bubble_outline8

repeat36

shareShare

Chip Huyen

@chipro

2 years ago

Open challenges in LLM research The first two challenges, hallucinations and context learning, are probably the most talked about today. I’m the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). Number 5 and number 6, new architectures and

thumb_up_off_alt1,1K

chat_bubble_outline51

repeat397

shareShare

Lucas Beyer (bl16)

@giffmana

2 years ago

What makes CLIP work? The contrast with negatives via softmax? The more negatives, the better -> large batch-size? We'll answer "no" to both in our ICCV oral🤓 By introducing SigLIP, a simpler CLIP that also works better and is more scalable, we can study the extremes. Hop in🧶

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat290

shareShare

James Harrison

@jmes_harrison

2 years ago

Want to learn about learned optimization? I gave a tutorial at CoLLAs 2025 which is now public! youtu.be/FMjYwthtoN4?si…

thumb_up_off_alt50

chat_bubble_outline0

repeat9

shareShare

Jascha Sohl-Dickstein

@jaschasd

2 years ago

Learned optimizers are the future!

thumb_up_off_alt25

chat_bubble_outline0

repeat3

shareShare

Chip Huyen

@chipro

2 years ago

New blog post: Multimodality and Large Multimodal Models (LMMs) Being able to work with data of different modalities -- e.g. text, images, videos, audio, etc. -- is essential for AI to operate in the real world. This post covers multimodal systems in general, including Large

thumb_up_off_alt917

chat_bubble_outline15

repeat192

shareShare

Luke Metz

@luke_metz

2 years ago

❤️

thumb_up_off_alt110

chat_bubble_outline1

repeat3

shareShare

Luke Metz

@luke_metz

2 years ago

OpenAI is nothing without its people.

thumb_up_off_alt365

chat_bubble_outline8

repeat18

shareShare

Luke Metz

@luke_metz

2 years ago

❤️

thumb_up_off_alt59

chat_bubble_outline3

repeat3

shareShare

Oscar Li

@oscarli101

2 years ago

📝Quiz time: when you have an unrolled computation graph (see figure below), how would you compute the unrolling parameters' gradients? If your answer only contains Backprop, now it’s time to add a new method to your gradient estimation toolbox!

thumb_up_off_alt128

chat_bubble_outline1

repeat13

shareShare

Luke Metz

@luke_metz

2 years ago

New gradient estimation technique lead by the fantastic Oscar Li! It provides low variance estimates gradients of unrolled, or iterative computation graphs such as those found in rl, learned optimizers, meta optimization. If you’re at NeuRIPS go check out the poster!

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Jascha Sohl-Dickstein

@jaschasd

2 years ago

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

thumb_up_off_alt9,9K

chat_bubble_outline275

repeat1,1K

shareShare

Chip Huyen

@chipro

2 years ago

A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query. One use case is model

thumb_up_off_alt390

chat_bubble_outline10

repeat54

shareShare

Luke Metz

@luke_metz

a year ago

It has been a pleasure working with you John. I am extremely sad to see you go. Best of luck with your new adventures!

thumb_up_off_alt53

chat_bubble_outline1

repeat0

shareShare

Luke Metz

@luke_metz

a year ago

I'm leaving OpenAI after over 2 years of wild ride. Alongside Barret Zoph , William Fedus , John Schulman , and many others I got to build a “low key research preview” product that became ChatGPT. While we were all excited to work on it, none of us expected it to be where it is

thumb_up_off_alt3,3K

chat_bubble_outline129

repeat127

shareShare

Chip Huyen

@chipro

a year ago

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. My editor just told me the manuscript has been sent to the printers. - The ebook will be coming out later this week. - Paperback copies should be available in a few weeks (hopefully

thumb_up_off_alt5,5K

chat_bubble_outline173

repeat627

shareShare

Luke Metz

@luke_metz

9 months ago

Super excited to share what we've been working on. It's a privilege to be working with such an amazing team!

thumb_up_off_alt243

chat_bubble_outline8

repeat8

shareShare

Nando de Freitas

@nandodf

8 months ago

Thinking Machines’ ⁦Luke Metz⁩ giving a clear, beautifully simple, but extremely clever and informative tutorial on post training at ⁦khipu.ai⁩ — the data is likely the most important factor!

Thinking Machines’ ⁦<a href="/Luke_Metz/">Luke Metz</a>⁩ giving a clear, beautifully simple, but extremely clever and informative tutorial on post training at ⁦<a href="/Khipu_AI/">khipu.ai</a>⁩ — the data is likely the most important factor!

thumb_up_off_alt59

chat_bubble_outline4

repeat4

shareShare