Raj Ghugare (@ghugareraj) Twitter Tweets • TwiCopy

Raj Ghugare

@ghugareraj

+ Follow

PhD student in Princeton CS.

ID: 1302134735141961729

linkhttps://rajghugare19.github.io/ calendar_today05-09-2020 06:41:42

59 Tweet

252 Followers

238 Following

Siddarth Venkatraman

@siddarthv66

2 months ago

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

thumb_up_off_alt488

chat_bubble_outline12

repeat55

shareShare

Seohong Park

@seohong_park

a month ago

Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: seohong.me/blog/dual-repr… Paper: arxiv.org/abs/2510.06714 ↓

thumb_up_off_alt786

chat_bubble_outline14

repeat96

shareShare

Michał Bortkiewicz @ICLR

@m_bortkiewicz

24 days ago

📜Is Temporal Difference (TD) learning the gold standard for stitching in RL? 🪡 Conventional wisdom suggests that TD methods are crucial for piecing together short-term behaviors to solve long-horizon tasks. But does it hold when using function approximation?

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare