Luke Metz (@luke_metz) 's Twitter Profile
Luke Metz

@luke_metz

Thinking Machines

Previously: OpenAI, Google Brain

ID: 887992016

linkhttp://lukemetz.com calendar_today18-10-2012 02:57:44

326 Tweet

16,16K Followers

1,1K Following

Chip Huyen (@chipro) 's Twitter Profile Photo

New post: bringing LLM applications to production! 1. Challenges of LLM engineering & the solutions that I’ve seen 2. How to compose multiple tasks and incorporate tools (e.g. SQL executor, bash, web browsers, third-party APIs) 3. Promising use cases huyenchip.com/2023/04/11/llm…

Chip Huyen (@chipro) 's Twitter Profile Photo

New post: RLHF - Reinforcement Learning from Human Feedback Discussing 3 phases of ChatGPT development, where RLHF fits in, how RLHF works, hypotheses on why it works, and relationship between RLHF and hallucination. huyenchip.com/2023/05/02/rlh…

New post: RLHF - Reinforcement Learning from Human Feedback

Discussing 3 phases of ChatGPT development, where RLHF fits in, how RLHF works,  hypotheses on why it works, and relationship between RLHF and hallucination.

huyenchip.com/2023/05/02/rlh…
Ishaan Gulrajani (@__ishaan) 's Twitter Profile Photo

New paper with Tatsunori Hashimoto! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619 Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵

New paper with <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a>! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619

Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵
Chip Huyen (@chipro) 's Twitter Profile Photo

Open challenges in LLM research The first two challenges, hallucinations and context learning, are probably the most talked about today. I’m the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). Number 5 and number 6, new architectures and

Open challenges in LLM research

The first two challenges, hallucinations and context learning, are probably the most talked about today.

I’m the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives).

Number 5 and number 6, new architectures and
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

What makes CLIP work? The contrast with negatives via softmax? The more negatives, the better -> large batch-size? We'll answer "no" to both in our ICCV oral🤓 By introducing SigLIP, a simpler CLIP that also works better and is more scalable, we can study the extremes. Hop in🧶

What makes CLIP work?
The contrast with negatives via softmax?
The more negatives, the better -&gt; large batch-size?

We'll answer "no" to both in our ICCV oral🤓
By introducing SigLIP, a simpler CLIP that also works better and is more scalable, we can study the extremes.

Hop in🧶
Chip Huyen (@chipro) 's Twitter Profile Photo

New blog post: Multimodality and Large Multimodal Models (LMMs) Being able to work with data of different modalities -- e.g. text, images, videos, audio, etc. -- is essential for AI to operate in the real world. This post covers multimodal systems in general, including Large

New blog post: Multimodality and Large Multimodal Models (LMMs)

Being able to work with data of different modalities -- e.g. text, images, videos, audio, etc. --  is essential for AI to operate in the real world.

This post covers multimodal systems in general, including Large
Oscar Li (@oscarli101) 's Twitter Profile Photo

📝Quiz time: when you have an unrolled computation graph (see figure below), how would you compute the unrolling parameters' gradients? If your answer only contains Backprop, now it’s time to add a new method to your gradient estimation toolbox!

📝Quiz time: when you have an unrolled computation graph (see figure below), how would you compute the unrolling parameters' gradients?

If your answer only contains Backprop, now it’s time to add a new method to your gradient estimation toolbox!
Luke Metz (@luke_metz) 's Twitter Profile Photo

New gradient estimation technique lead by the fantastic Oscar Li! It provides low variance estimates gradients of unrolled, or iterative computation graphs such as those found in rl, learned optimizers, meta optimization. If you’re at NeuRIPS go check out the poster!

Jascha Sohl-Dickstein (@jaschasd) 's Twitter Profile Photo

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.

Chip Huyen (@chipro) 's Twitter Profile Photo

A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query. One use case is model

A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt?

Predictive human preference aims to predict which model users might prefer for a specific query.

One use case is model
Luke Metz (@luke_metz) 's Twitter Profile Photo

It has been a pleasure working with you John. I am extremely sad to see you go. Best of luck with your new adventures!

Luke Metz (@luke_metz) 's Twitter Profile Photo

I'm leaving OpenAI after over 2 years of wild ride. Alongside Barret Zoph , William Fedus , John Schulman , and many others I got to build a “low key research preview” product that became ChatGPT. While we were all excited to work on it, none of us expected it to be where it is

Chip Huyen (@chipro) 's Twitter Profile Photo

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. My editor just told me the manuscript has been sent to the printers. - The ebook will be coming out later this week. - Paperback copies should be available in a few weeks (hopefully

It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links.

My editor just told me the manuscript has been sent to the printers. 

- The ebook will be coming out later this week.
- Paperback copies should be available in a few weeks (hopefully
Nando de Freitas (@nandodf) 's Twitter Profile Photo

Thinking Machines’ ⁦Luke Metz⁩ giving a clear, beautifully simple, but extremely clever and informative tutorial on post training at ⁦khipu.ai⁩ — the data is likely the most important factor!

Thinking Machines’ ⁦<a href="/Luke_Metz/">Luke Metz</a>⁩ giving a clear, beautifully simple, but extremely clever and informative tutorial on post training at ⁦<a href="/Khipu_AI/">khipu.ai</a>⁩  — the data is likely the most important factor!