Richard Pang (@yzpang_) 's Twitter Profile
Richard Pang

@yzpang_

yzpang.me; @AIatMeta Llama research, prev: NYU, Meta FAIR, @uchicago, @googleai; research: llm, text gen, alignment, reasoning, human-lm collab, etc.

ID: 919679161903534081

linkhttp://yzpang.me calendar_today15-10-2017 21:39:33

96 Tweet

477 Takipçi

390 Takip Edilen

Sumit (@_reachsumit) 's Twitter Profile Photo

Transformers Struggle to Learn to Search Demonstrates transformers can be taught to perform graph search tasks but increasingly struggle on larger graphs, with this difficulty not resolved by increased model scale. 📝arxiv.org/abs/2412.04703

elvis (@omarsar0) 's Twitter Profile Photo

Transformers Struggle to Learn to Search Finds that transformer-based LLMs struggle to perform search robustly. Suggests that given the right training distribution, the transformer can learn to search. Also reports that performing search in-context exploration (i.e.,

Transformers Struggle to Learn to Search

Finds that transformer-based LLMs struggle to perform search robustly. 

Suggests that given the right training distribution, the transformer can learn to search. 

Also reports that performing search in-context exploration (i.e.,
Mehran Kazemi (@kazemi_sm) 's Twitter Profile Photo

Search is a core operation for reasoning in large language models. Check out our new work where we dive deep into the ability of the transformer models to learn to search.

Nitish Joshi (@nitishjoshi23) 's Twitter Profile Photo

New work where we show that with the right training distribution, transformers can learn to search and internally implement an exponential path-merging algo. But they struggle to learn to search as the graph size increases, and simple solns like scaling doesn't resolve it.

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were

Archiki Prasad (@archikiprasad) 's Twitter Profile Photo

🎉 Excited to share that my internship work, ScPO, on self-training LLMs to improve reasoning without human labels, has been accepted to #ICML2025! Many thanks to my awesome collaborators at AI at Meta and @uncnlp🌞Looking forward to presenting ScPO in Vancouver 🇨🇦

Jason Weston (@jaseweston) 's Twitter Profile Photo

We worked on a whole line of research on this: - Self-Rewarding LMs (use self as a Judge in semi-online DPO): arxiv.org/abs/2401.10020 - Thinking LLMs (learn CoTs with a Judge with semi-online DPO): arxiv.org/abs/2410.10630 *poster at ICML this week!!* - Mix verifiable &

Archiki Prasad (@archikiprasad) 's Twitter Profile Photo

I’ll be at #ICML2025 this week to present ScPO: 📌 Wednesday, July 16th, 11:00 AM-1:30 PM 📍East Exhibition Hall A-B, E-2404 Stop by or reach out to chat about improving reasoning in LLMs, self-training, or just tips about being on the job market next cycle! 😃

Richard Pang (@yzpang_) 's Twitter Profile Photo

🚨Prompt Curriculum Learning (PCL) - Efficient LLM RL training algo! - We investigate factors that affect convergence: bsz, # prompt, # gen, prompt selection - We propose PCL: lightweight algo that *dynamically selects intermediate-difficulty prompts* using a learned value model

🚨Prompt Curriculum Learning (PCL) 
- Efficient LLM RL training algo!
- We investigate factors that affect convergence: bsz, # prompt, # gen, prompt selection
- We propose PCL: lightweight algo that *dynamically selects intermediate-difficulty prompts* using a learned value model
Joongwon Kim (@danieljwkim) 's Twitter Profile Photo

Excited to share Prompt Curriculum Learning (PCL) from AI at Meta - we improve performance-efficiency tradeoffs for reasoning RL by predicting prompt difficulty with a value model updated on-policy, and selecting intermediate-difficulty prompts that yield high effective ratios.

Nicholas Lourie (@nicklourie) 's Twitter Profile Photo

LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. Kyunghyun Cho, He He, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9