Sunghwan Kim (@seonghwan_57) 's Twitter Profile
Sunghwan Kim

@seonghwan_57

M.S. student, Yonsei University

ID: 1749046906557444096

linkhttps://kimsh0507.github.io/ calendar_today21-01-2024 12:31:25

26 Tweet

21 Takipçi

410 Takip Edilen

Wenlong Huang (@wenlong_huang) 's Twitter Profile Photo

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

Allen Z. Ren (@allenzren) 's Twitter Profile Photo

👇Introducing DPPO, Diffusion Policy Policy Optimization DPPO optimizes pre-trained Diffusion Policy using policy gradient from RL, showing 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 over a variety of baselines across benchmarks and sim2real transfer diffusion-ppo.github.io

Tianyuan Dai (@rogerdai1217) 's Twitter Profile Photo

Why hand-engineer digital twins when digital cousins are free? Check out ACDC: Automated Creation of Digital Cousins 👭 for Robust Policy Learning, accepted at @corl2024! 🎉 📸 Single image -> 🏡 Interactive scene ⏩ Fully automatic (no annotations needed!) 🦾 Robot policies

Will Liang (@willjhliang) 's Twitter Profile Photo

Introducing Eurekaverse 🌎, a path toward training robots in infinite simulated worlds! Eurekaverse is a framework for automatic environment and curriculum design using LLMs. This iterative method creates useful environments designed to progressively challenge the policy during

Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨 Adaptive Decoding via Latent Preference Optimization 🚨 - New layer added to Transformer, selects decoding params automatically *per token* - Learnt via new method, Latent Preference Optimization - Outperforms any fixed temperature decoding method, choosing creativity or

🚨 Adaptive Decoding via Latent Preference Optimization 🚨
- New layer added to Transformer, selects decoding params automatically *per token*
- Learnt via new method, Latent Preference Optimization
- Outperforms any fixed temperature decoding method, choosing creativity or
Xidong Feng (@xidong_feng) 's Twitter Profile Photo

Happy to share our new exploration "Natural Language Reinforcement Learning" (NLRL), the last dance of my PhD 🛎️(1/n): Paper: arxiv.org/abs/2411.14251 Code: github.com/waterhorse1/Na… (released soon) NLRL reframes core RL concepts—policy, value function, Bellman equation, MC, TD,

Happy to share our new exploration "Natural Language Reinforcement Learning" (NLRL), the last dance of my PhD 🛎️(1/n):

Paper: arxiv.org/abs/2411.14251
Code: github.com/waterhorse1/Na… (released soon)

NLRL reframes core RL concepts—policy, value function, Bellman equation, MC, TD,
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Introducing Genie 2: our AI model that can create an endless variety of playable 3D worlds - all from a single image. 🖼️ These types of large-scale foundation world models could enable future agents to be trained and evaluated in an endless number of virtual environments. →

Zhou Xian (@zhou_xian_) 's Twitter Profile Photo

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics

Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

As a RL research myself, I once doubted Reinforcement Learning (RL) because massive self-supervised LLMs were dominating. But now I see how RL can bring us closer to super-intelligent (ASI) systems—far beyond board games. Here’s what changed my mind: (1/5)

As a RL research myself, I once doubted Reinforcement Learning (RL) because massive self-supervised LLMs were dominating.
But now I see how RL can bring us closer to super-intelligent (ASI) systems—far beyond board games. 
Here’s what changed my mind: (1/5)
Sergey Levine (@svlevine) 's Twitter Profile Photo

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators: arxiv.org/abs/2502.04327

MatthewBerman (@matthewberman) 's Twitter Profile Photo

Major AI breakthrough: Diffusion Large Language Models are here! They're 10x faster and 10x cheaper than traditional LLMs. Here's everything you need to know:

Marianne Arriola @ ICLR’25 (@mariannearr) 's Twitter Profile Photo

🚨Announcing our #ICLR2025 Oral! 🔥Diffusion LMs are on the rise for parallel text generation! But unlike autoregressive LMs, they struggle with quality, fixed-length constraints & lack of KV caching. 🚀Introducing Block Diffusion—combining autoregressive and diffusion models

Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

In the last two months, RAGEN has powered Agent RL training frameworks for over 300,000 people. Now, we’re introducing VAGEN—the first open-source framework that trains *Visual* Agents using multi-turn Reinforcement Learning! 🚀(1/n)

In the last two months, RAGEN has powered Agent RL training frameworks for over 300,000 people.
Now, we’re introducing VAGEN—the first open-source framework that trains *Visual* Agents using multi-turn Reinforcement Learning! 🚀(1/n)
Sunghwan Kim (@seonghwan_57) 's Twitter Profile Photo

Would you like to enhance your web agent? Check out our work! Web-Shepherd is a (process) reward model designed for interactive web environments beyond single-turn tasks. Huge thanks to Hyungjoo Chae for the amazing collaboration!

Wooseok Seo (@just1nseo) 's Twitter Profile Photo

🚀New Paper! arxiv.org/abs/2506.13342 While fact verification is essential to ensure the reliability of LLMs, detailed analysis of fact verifiers remains understudied. We present several findings based on our revised dataset, along with practical guidance to improve the models.

🚀New Paper!
arxiv.org/abs/2506.13342

While fact verification is essential to ensure the reliability of LLMs, detailed analysis of fact verifiers remains understudied. 

We present several findings based on our revised dataset, along with practical guidance to improve the models.
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to