Manav Singhal (@manavsinghal157) Twitter Tweets • TwiCopy

Manav Singhal

@manavsinghal157

+ Follow

Maxim AI | Previously @MSFTResearch | Undergrad @surathkal_nitk

ID: 1281832742863306752

linkhttps://manavsinghal157.github.io/ calendar_today11-07-2020 06:08:41

43 Tweet

316 Followers

3,3K Following

Lakshya A Agrawal

@lakshyaaagrawal

9 months ago

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

thumb_up_off_alt458

chat_bubble_outline15

repeat87

shareShare

Anirudh Buvanesh

@anirudhbuvanesh

8 months ago

Zero rewards after tons of RL training? 😞 Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. 🔓📈To know more checkout our blog post here: spiffy-airbus-472.notion.site/What-Can-You-D…. Keep reading 🧵(1/n)

thumb_up_off_alt97

chat_bubble_outline2

repeat30

shareShare

Manav Singhal

@manavsinghal157

7 months ago

Blog Link: dynamicthoughts42.blogspot.com/2025/10/attent…

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Nathan Lambert

@natolambert

7 months ago

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat193

shareShare

Saujas Vaduguru

@saujasv

6 months ago

When we instruct an agent to design something, its first output may not be precisely what we want. Humans collaborating refine their creations iteratively. Can we instruct an agent to refine its output? Is language the best medium for these instructions? We explore this in mrCAD.

thumb_up_off_alt19

chat_bubble_outline1

repeat8

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

6 months ago

Everyone in Tech needs to read this and go touch grass: (Link below)

thumb_up_off_alt30

chat_bubble_outline3

repeat3

shareShare

John Yang

@jyangballin

6 months ago

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

thumb_up_off_alt375

chat_bubble_outline29

repeat92

shareShare

Manav Singhal

@manavsinghal157

6 months ago

Pikachu!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Manav Singhal

@manavsinghal157

6 months ago

LLMs have slowly started becoming a driver for daily tasks and questions. But in this shift, will humans eventually overfit on the model distribution? My latest blog talks about how we all might converge using LLMs.

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Travers

@travers00

5 months ago

I've tried to start writing again, so I wrote an essay called 'Durable Consumables' - reflecting on how we've stopped building things to last something essential about craft, attachment, and how we inhabit the world is changing in the process travers.fyi/durables

thumb_up_off_alt65

chat_bubble_outline4

repeat4

shareShare

Vinod Ganesan

@vinodgansan

5 months ago

Knowledge work as you know will change with Almanac. Check out our first glimpse and share your feedback!

thumb_up_off_alt21

chat_bubble_outline1

repeat4

shareShare

Manav Singhal

@manavsinghal157

5 months ago

Interesting read!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Pratyush Maini

@pratyushmaini

5 months ago

I reverse engineered a phase change in GPT's training data... with the seahorse emoji 🌊🐴 My forensic investigation reveals why non-thinking models have started "thinking out loud" & what it reveals about how frontier labs train their latest models pratyushmaini.substack.com/p/reverse-engi…🧵

thumb_up_off_alt298

chat_bubble_outline7

repeat30

shareShare

systematic longshort

@systematicls

4 months ago

x.com/i/article/2004…

thumb_up_off_alt20,20K

chat_bubble_outline776

repeat3,3K

shareShare