Souradip Chakraborty (@souradipchakr18) Twitter Tweets • TwiCopy

Souradip Chakraborty

@souradipchakr18

+ Follow

Student Researcher @Google || PhD @umdcs @ml_umd, working on #LLM #Alignment #RLHF #Reasoning
Prev : #JPMC #Walmart Labs, MS #IndianStatisticalInstitute

ID: 1135038999297314817

linkhttps://souradip-umd.github.io/ calendar_today02-06-2019 04:22:40

1,1K Tweet

1,1K Followers

4,4K Following

Arian Khorasani 🦅

@arian_khorasani

4 months ago

Ravid Shwartz Ziv Andrew Gordon Wilson Micah Goldblum Amrit Singh Bedi Furong Huang Philippe Beaudoin Sharon Li 🇺🇦 Dzmitry Bahdanau Dan Roy Lenka Zdeborova Alessandro Sordoni Pablo Samuel Castro David Duvenaud Roger Grosse Percy Liang Leo Dianbo Liu clem 🤗 Zhijing Jin James Zou David Krueger Tanishq Mathew Abraham, Ph.D.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Dimitris Papailiopoulos This is true. But we show a principled method in our #ICLR2025 paper via Controlled Decoding Paper: arxiv.org/abs/2503.21720 x.com/SOURADIPCHAKR1… cc : Amrit Singh Bedi Furong Huang

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Souradip Chakraborty

@souradipchakr18

3 months ago

Gowthami If the rewards are not conflicting and independent, i think adding should be ok (with standardisation). But else Maxmin RLHF: arxiv.org/abs/2402.08925

thumb_up_off_alt8

chat_bubble_outline2

repeat2

shareShare

Souradip Chakraborty

@souradipchakr18

3 months ago

Gowthami Ahmad Beirami Multi-objective case: arxiv.org/abs/2505.23729 where we formulate it as constrained optimization with Bounded rationality principles

thumb_up_off_alt2

chat_bubble_outline1

repeat2

shareShare

Souradip Chakraborty

@souradipchakr18

3 months ago

🚨 #Neurips2025 Workshop Alert: Biosecurity Safeguards for #GenerativeAI Link : biosafe-gen-ai.github.io Deadline extended: August 29, 2025 AOE Yoshua Bengio Peter Henderson Jian Ma Russ Altman Theofanis Karaletsos Megan Blewett george church CL • Le Cong Mengdi Wang Amrit Singh Bedi Zaixi Zhang

🚨 #Neurips2025 Workshop Alert: Biosecurity Safeguards for #GenerativeAI

Link : biosafe-gen-ai.github.io
Deadline extended: August 29, 2025 AOE

<a href="/Yoshua_Bengio/">Yoshua Bengio</a> <a href="/PeterHndrsn/">Peter Henderson</a> <a href="/jmuiuc/">Jian Ma</a> <a href="/Rbaltman/">Russ Altman</a>
<a href="/Tkaraletsos/">Theofanis Karaletsos</a> <a href="/MeganBlewett/">Megan Blewett</a> <a href="/geochurch/">george church</a> <a href="/lecong/">CL • Le Cong</a> <a href="/MengdiWang10/">Mengdi Wang</a> <a href="/amritsinghbedi3/">Amrit Singh Bedi</a> <a href="/ZaixiZhang/">Zaixi Zhang</a>

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Zaixi Zhang

@zaixizhang

3 months ago

Due to numerous requests from prospective contributors, we have decided to extend the submission deadline for our workshop Call for Papers by one week. We look forward to receiving your excellent papers!

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Zaixi Zhang

@zaixizhang

3 months ago

At the request of many prospective submitters, we have decided to extend the workshop call for papers deadline by one week. The workshop Best Paper Award will carry a prize of $3,000, and the Runner-up Award will carry a prize of $1,000.

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Amrit Singh Bedi

@amritsinghbedi3

3 months ago

Xinyu Yang AK Nice work! A related reference about test time scaling via Parallel Thinking x.com/SOURADIPCHAKR1…

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Amrit Singh Bedi

@amritsinghbedi3

2 months ago

🚀 Your diffusion LLM is secretly a team of semi-autoregressive experts hiding inside. 💡We uncover this hidden structure and show how to unlock it with HEX (Hidden Semi-Autoregressive Experts) that 𝐨𝐮𝐭𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐬 𝐆𝐑𝐏𝐎!! 👉 New dimension of test-time scaling

thumb_up_off_alt250

chat_bubble_outline9

repeat37

shareShare

Amrit Singh Bedi

@amritsinghbedi3

2 months ago

Souradip Chakraborty Tagging a few folks who might be interested in this topic. Looking forward to any thoughts, questions, or feedback! Aditya Grover Jaeyeon (Jay) Kim Subham Sahoo Simo Ryu Jiaxin Shi Sasha Rush Peter Holderrieth Alex Tong Quanquan Gu Molei Tao Devaansh Gupta Siyan Zhao

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Aldo Pacchiano

@aldopacchiano

2 months ago

(1/4) Typical LLM post-training mechanisms have a hard time learning models that can produce diverse responses. To fix this we introduce 𝐃𝐐𝐎 (𝐃𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧), a method for post-training LLMs to generate diverse high-quality

thumb_up_off_alt17

chat_bubble_outline1

repeat7

shareShare

Furong Huang

@furongh

2 months ago

In 2010, I came to the U.S. straight from undergrad for a PhD. Fifteen years later the map looks messy, but the line of best fit is clear. 🤍 2010 — PhD Year 1: my advisor said, “Take the ML course.” I had never heard of ML. With the most supportive, inspiring advisor, I pivoted

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat153

shareShare

Souradip Chakraborty

@souradipchakr18

2 months ago

Such an inspiring journey of Furong Huang 🎉

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

The paper says diffusion LLMs hide several small experts, and using them together at test time boosts reasoning. A simple voting trick raises math accuracy to 88.10% without extra training. Diffusion LLMs generate by masking and filling chunks of text in steps. Because

thumb_up_off_alt137

chat_bubble_outline5

repeat35

shareShare

Amrit Singh Bedi

@amritsinghbedi3

2 months ago

Diffusion #LLMs, our work provides - Interesting insights into its working - test-time scaling - can outperform the fine-tuned GRPO version - raises interesting questions about optimal inference in dLLMs (a lot to gain and explore) x.com/amritsinghbedi…

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

Csaba Szepesvari

@csabaszepesvari

a month ago

Andrej Karpathy Andrej Karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x

thumb_up_off_alt415

chat_bubble_outline9

repeat39

shareShare

Csaba Szepesvari

@csabaszepesvari

a month ago

Andrej Karpathy It seems to me that not only you, but too many people talk about RL as if these two things were the same, which prevents a more nuanced discussion. 2/2

thumb_up_off_alt104

chat_bubble_outline4

repeat5

shareShare

Amrit Singh Bedi

@amritsinghbedi3

a month ago

Zachary Horvitz Cool work! Indeed, inference time sampling for MDLMs has a lot to offer, our related reference :) x.com/amritsinghbedi…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare