Chirag Nagpal (@nagpalchirag) 's Twitter Profile
Chirag Nagpal

@nagpalchirag

~~ in the @Meta-verse ~~

ID: 107757999

linkhttp://cs.cmu.edu/~chiragn calendar_today23-01-2010 16:43:38

1,1K Tweet

1,1K Followers

738 Following

Han Fang (@han_fang_) 's Twitter Profile Photo

🧠 Tokens for Thoughts is live! Karthik A Sankararaman 🇮🇳🇺🇸 & I are distilling our reading notes into bite-sized LLM notebooks—come read papers with us. Ep. 1: Post-Training 101 (SFT + RL primer) 👉 tokens-for-thoughts.notion.site/post-training-… DMs or replies open for topic requests!

Chirag Nagpal (@nagpalchirag) 's Twitter Profile Photo

Serious question: How do you estimate the average length of responses generated by an LLM when a large number of responses may get truncated before seeing the <eos> token ?

Chirag Nagpal (@nagpalchirag) 's Twitter Profile Photo

Gemini Post Training : Please add a regex reward that discounts responses that include the phrase "It's important to note that..."

Jianfeng Chi (@jianfengchi) 's Twitter Profile Photo

[1/N] Check out our new LLM reasoning work! The "aha moment" in Math can be elicited through RLVR, can we do the same for (safety) alignment in RLHF without much modification in the training algorithm. The answer is yes.

Chirag Nagpal (@nagpalchirag) 's Twitter Profile Photo

No way are you really telling me y'all aligning them AIs and don't even know the freakinn length of your chain of thought 💀

Chirag Nagpal (@nagpalchirag) 's Twitter Profile Photo

I worked with Daniel Machlab very closely on improving our system level guardrails such as Llama Guard. Most recently I and Daniel were collaborating on a next generation Reasoning based safety guardrail. It's unfortunate to see him go, reaching out to my professional

Chirag Nagpal (@nagpalchirag) 's Twitter Profile Photo

There's a glut of papers on RL for post training. Some are good, but most don't really solve actual, real post training challenges at the frontier scale. Also can we please stop obsessing about KL reg already.