Souradip Chakraborty (@souradipchakr18) 's Twitter Profile
Souradip Chakraborty

@souradipchakr18

Student Researcher @Google || PhD @umdcs @ml_umd, working on #LLM #Alignment #RLHF #Reasoning
Prev : #JPMC #Walmart Labs, MS #IndianStatisticalInstitute

ID: 1135038999297314817

linkhttps://souradip-umd.github.io/ calendar_today02-06-2019 04:22:40

1,1K Tweet

1,1K Followers

4,4K Following

Souradip Chakraborty (@souradipchakr18) 's Twitter Profile Photo

Gowthami If the rewards are not conflicting and independent, i think adding should be ok (with standardisation). But else Maxmin RLHF: arxiv.org/abs/2402.08925

Zaixi Zhang (@zaixizhang) 's Twitter Profile Photo

Due to numerous requests from prospective contributors, we have decided to extend the submission deadline for our workshop Call for Papers by one week. We look forward to receiving your excellent papers!

Zaixi Zhang (@zaixizhang) 's Twitter Profile Photo

At the request of many prospective submitters, we have decided to extend the workshop call for papers deadline by one week. The workshop Best Paper Award will carry a prize of $3,000, and the Runner-up Award will carry a prize of $1,000.

Amrit Singh Bedi (@amritsinghbedi3) 's Twitter Profile Photo

๐Ÿš€ Your diffusion LLM is secretly a team of semi-autoregressive experts hiding inside. ๐Ÿ’กWe uncover this hidden structure and show how to unlock it with HEX (Hidden Semi-Autoregressive Experts) that ๐จ๐ฎ๐ญ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐ฌ ๐†๐‘๐๐Ž!! ๐Ÿ‘‰ New dimension of test-time scaling

๐Ÿš€ Your diffusion LLM is secretly a team of semi-autoregressive experts hiding inside.  

๐Ÿ’กWe uncover this hidden structure and show how to unlock it with HEX (Hidden Semi-Autoregressive Experts) that ๐จ๐ฎ๐ญ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐ฌ ๐†๐‘๐๐Ž!!  

๐Ÿ‘‰ New dimension of test-time scaling
Aldo Pacchiano (@aldopacchiano) 's Twitter Profile Photo

(1/4) Typical LLM post-training mechanisms have a hard time learning models that can produce diverse responses. To fix this we introduce ๐ƒ๐๐Ž (๐ƒ๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ ๐๐ฎ๐š๐ฅ๐ข๐ญ๐ฒ ๐Ž๐ฉ๐ญ๐ข๐ฆ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง), a method for post-training LLMs to generate diverse high-quality

Furong Huang (@furongh) 's Twitter Profile Photo

In 2010, I came to the U.S. straight from undergrad for a PhD. Fifteen years later the map looks messy, but the line of best fit is clear. ๐Ÿค 2010 โ€” PhD Year 1: my advisor said, โ€œTake the ML course.โ€ I had never heard of ML. With the most supportive, inspiring advisor, I pivoted

In 2010, I came to the U.S. straight from undergrad for a PhD. Fifteen years later the map looks messy, but the line of best fit is clear. ๐Ÿค

2010 โ€” PhD Year 1: my advisor said, โ€œTake the ML course.โ€ I had never heard of ML. With the most supportive, inspiring advisor, I pivoted
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

The paper says diffusion LLMs hide several small experts, and using them together at test time boosts reasoning. A simple voting trick raises math accuracy to 88.10% without extra training. Diffusion LLMs generate by masking and filling chunks of text in steps. Because

The paper says diffusion LLMs hide several small experts, and using them together at test time boosts reasoning. 

A simple voting trick raises math accuracy to 88.10% without extra training.

Diffusion LLMs generate by masking and filling chunks of text in steps.

Because
Amrit Singh Bedi (@amritsinghbedi3) 's Twitter Profile Photo

Diffusion #LLMs, our work provides - Interesting insights into its working - test-time scaling - can outperform the fine-tuned GRPO version - raises interesting questions about optimal inference in dLLMs (a lot to gain and explore) x.com/amritsinghbediโ€ฆ

Csaba Szepesvari (@csabaszepesvari) 's Twitter Profile Photo

Andrej Karpathy Andrej Karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x

Csaba Szepesvari (@csabaszepesvari) 's Twitter Profile Photo

Andrej Karpathy It seems to me that not only you, but too many people talk about RL as if these two things were the same, which prevents a more nuanced discussion. 2/2