Manav Singhal (@manavsinghal157) 's Twitter Profile
Manav Singhal

@manavsinghal157

Maxim AI | Previously @MSFTResearch | Undergrad @surathkal_nitk

ID: 1281832742863306752

linkhttps://manavsinghal157.github.io/ calendar_today11-07-2020 06:08:41

43 Tweet

316 Followers

3,3K Following

Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

How does prompt optimization compare to RL algos like GRPO?

GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't.

Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
Anirudh Buvanesh (@anirudhbuvanesh) 's Twitter Profile Photo

Zero rewards after tons of RL training? 😞 Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. 🔓📈To know more checkout our blog post here: spiffy-airbus-472.notion.site/What-Can-You-D…. Keep reading 🧵(1/n)

Nathan Lambert (@natolambert) 's Twitter Profile Photo

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon.

The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri & Madaan et al.
Saujas Vaduguru (@saujasv) 's Twitter Profile Photo

When we instruct an agent to design something, its first output may not be precisely what we want. Humans collaborating refine their creations iteratively. Can we instruct an agent to refine its output? Is language the best medium for these instructions? We explore this in mrCAD.

When we instruct an agent to design something, its first output may not be precisely what we want. Humans collaborating refine their creations iteratively. Can we instruct an agent to refine its output? Is language the best medium for these instructions? We explore this in mrCAD.
John Yang (@jyangballin) 's Twitter Profile Photo

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

Manav Singhal (@manavsinghal157) 's Twitter Profile Photo

LLMs have slowly started becoming a driver for daily tasks and questions. But in this shift, will humans eventually overfit on the model distribution? My latest blog talks about how we all might converge using LLMs.

LLMs have slowly started becoming a driver for daily tasks and questions. But in this shift, will humans eventually overfit on the model distribution?

My latest blog talks about how we all might converge using LLMs.
Travers (@travers00) 's Twitter Profile Photo

I've tried to start writing again, so I wrote an essay called 'Durable Consumables' - reflecting on how we've stopped building things to last something essential about craft, attachment, and how we inhabit the world is changing in the process travers.fyi/durables

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

I reverse engineered a phase change in GPT's training data... with the seahorse emoji 🌊🐴 My forensic investigation reveals why non-thinking models have started "thinking out loud" & what it reveals about how frontier labs train their latest models pratyushmaini.substack.com/p/reverse-engi…🧵

I reverse engineered a phase change in GPT's training data... with the seahorse emoji 🌊🐴

My forensic investigation reveals why non-thinking models have started "thinking out loud" & what it reveals about how frontier labs train their latest models

pratyushmaini.substack.com/p/reverse-engi…🧵