Raj Ghugare (@ghugareraj) 's Twitter Profile
Raj Ghugare

@ghugareraj

PhD student in Princeton CS.

ID: 1302134735141961729

linkhttps://rajghugare19.github.io/ calendar_today05-09-2020 06:41:42

59 Tweet

252 Followers

238 Following

Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

NO verifiers. NO Tools.
Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling.

Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of!
Then we use aggregation-aware RL to push further!! 📈📈
🧵below!
Seohong Park (@seohong_park) 's Twitter Profile Photo

Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: seohong.me/blog/dual-repr… Paper: arxiv.org/abs/2510.06714 ↓

Introducing *dual representations*!

tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL.

Blog post: seohong.me/blog/dual-repr…
Paper: arxiv.org/abs/2510.06714

↓
Michał Bortkiewicz @ICLR (@m_bortkiewicz) 's Twitter Profile Photo

📜Is Temporal Difference (TD) learning the gold standard for stitching in RL? 🪡 Conventional wisdom suggests that TD methods are crucial for piecing together short-term behaviors to solve long-horizon tasks. But does it hold when using function approximation?