Sunghwan Kim (@seonghwan_57) Twitter Tweets • TwiCopy

Wenlong Huang

2 years ago

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

thumb_up_off_alt513

chat_bubble_outline18

repeat103

shareShare

Allen Z. Ren

@allenzren

2 years ago

👇Introducing DPPO, Diffusion Policy Policy Optimization DPPO optimizes pre-trained Diffusion Policy using policy gradient from RL, showing 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 over a variety of baselines across benchmarks and sim2real transfer diffusion-ppo.github.io

thumb_up_off_alt474

chat_bubble_outline5

repeat89

shareShare

Tianyuan Dai

@rogerdai1217

a year ago

Why hand-engineer digital twins when digital cousins are free? Check out ACDC: Automated Creation of Digital Cousins 👭 for Robust Policy Learning, accepted at @corl2024! 🎉 📸 Single image -> 🏡 Interactive scene ⏩ Fully automatic (no annotations needed!) 🦾 Robot policies

thumb_up_off_alt160

chat_bubble_outline11

repeat39

shareShare

Will Liang

@willjhliang

a year ago

Introducing Eurekaverse 🌎, a path toward training robots in infinite simulated worlds! Eurekaverse is a framework for automatic environment and curriculum design using LLMs. This iterative method creates useful environments designed to progressively challenge the policy during

thumb_up_off_alt499

chat_bubble_outline11

repeat92

shareShare

Jason Weston

@jaseweston

a year ago

🚨 Adaptive Decoding via Latent Preference Optimization 🚨 - New layer added to Transformer, selects decoding params automatically *per token* - Learnt via new method, Latent Preference Optimization - Outperforms any fixed temperature decoding method, choosing creativity or

thumb_up_off_alt366

chat_bubble_outline1

repeat57

shareShare

Xidong Feng

@xidong_feng

a year ago

Happy to share our new exploration "Natural Language Reinforcement Learning" (NLRL), the last dance of my PhD 🛎️(1/n): Paper: arxiv.org/abs/2411.14251 Code: github.com/waterhorse1/Na… (released soon) NLRL reframes core RL concepts—policy, value function, Bellman equation, MC, TD,

thumb_up_off_alt256

chat_bubble_outline9

repeat71

shareShare

Google DeepMind

@googledeepmind

a year ago

Introducing Genie 2: our AI model that can create an endless variety of playable 3D worlds - all from a single image. 🖼️ These types of large-scale foundation world models could enable future agents to be trained and evaluated in an endless number of virtual environments. →

thumb_up_off_alt6,6K

chat_bubble_outline348

repeat1,1K

shareShare

Zhou Xian

@zhou_xian_

a year ago

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics

thumb_up_off_alt16,16K

chat_bubble_outline578

repeat3,3K

shareShare

Zhengyao Jiang

@zhengyaojiang

a year ago

As a RL research myself, I once doubted Reinforcement Learning (RL) because massive self-supervised LLMs were dominating. But now I see how RL can bring us closer to super-intelligent (ASI) systems—far beyond board games. Here’s what changed my mind: (1/5)

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat174

shareShare

Sergey Levine

@svlevine

a year ago

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators: arxiv.org/abs/2502.04327

thumb_up_off_alt561

chat_bubble_outline10

repeat89

shareShare

MatthewBerman

@matthewberman

a year ago

Major AI breakthrough: Diffusion Large Language Models are here! They're 10x faster and 10x cheaper than traditional LLMs. Here's everything you need to know:

thumb_up_off_alt3,3K

chat_bubble_outline151

repeat407

shareShare

Marianne Arriola @ ICLR’25

@mariannearr

a year ago

🚨Announcing our #ICLR2025 Oral! 🔥Diffusion LMs are on the rise for parallel text generation! But unlike autoregressive LMs, they struggle with quality, fixed-length constraints & lack of KV caching. 🚀Introducing Block Diffusion—combining autoregressive and diffusion models

thumb_up_off_alt880

chat_bubble_outline16

repeat133

shareShare

Zihan Wang - on RAGEN

@wzihanw

a year ago

In the last two months, RAGEN has powered Agent RL training frameworks for over 300,000 people. Now, we’re introducing VAGEN—the first open-source framework that trains *Visual* Agents using multi-turn Reinforcement Learning! 🚀(1/n)

thumb_up_off_alt199

chat_bubble_outline3

repeat32

shareShare

AK

@_akhaliq

10 months ago

Web-Shepherd just dropped on Hugging Face Advancing PRMs for Reinforcing Web Agents

thumb_up_off_alt122

chat_bubble_outline1

repeat27

shareShare

Sunghwan Kim

@seonghwan_57

10 months ago

Would you like to enhance your web agent? Check out our work! Web-Shepherd is a (process) reward model designed for interactive web environments beyond single-turn tasks. Huge thanks to Hyungjoo Chae for the amazing collaboration!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Wooseok Seo

@just1nseo

10 months ago

🚀New Paper! arxiv.org/abs/2506.13342 While fact verification is essential to ensure the reliability of LLMs, detailed analysis of fact verifiers remains understudied. We present several findings based on our revised dataset, along with practical guidance to improve the models.

thumb_up_off_alt104

chat_bubble_outline1

repeat26

shareShare

Thinking Machines

@thinkymachines

7 months ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

thumb_up_off_alt6,6K

chat_bubble_outline205

repeat1,1K

shareShare