Haotian Fu (@haotiannnnnnnnn) Twitter Tweets • TwiCopy

I had an amazing internship experience working at the Discovery team! Don't hesitate to reach out/apply if you are passionate about RL, Planning and LLMs!

thumb_up_off_alt46

chat_bubble_outline0

repeat8

shareShare

Can robots self-improve by collecting data autonomously🤖? Introducing SOAR: a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions . auto-improvement.github.io

thumb_up_off_alt185

chat_bubble_outline6

repeat37

shareShare

Yilun Du

@du_yilun

8 months ago

I'm recruiting PhD students this year with interest in machine learning, embodied AI, or AI for science! If you are interested in constructing fundamental tools to improve Generative AI and exploring how these tools can be used for intelligent embodied agents and science,

thumb_up_off_alt966

chat_bubble_outline32

repeat160

shareShare

Xidong Feng

@xidong_feng

8 months ago

Thrilled to share that I’ll be joining Google DeepMind as a Research Scientist with the Discovery Team! It’s a dream come true—8 years ago, I watched AlphaGo live in high school, and now I am lucky enough to be part of this incredible journey. Can’t wait to discover what’s next!

thumb_up_off_alt457

chat_bubble_outline24

repeat15

shareShare

Haotian Fu

@haotiannnnnnnnn

8 months ago

We will present EPO: Hierarchical LLM Agents with Environment Preference Optimization at #EMNLP2024 next week, which achieves SOTA performance on ALFRED by finetuning a hierarchical Llama2-based agent and modeling the environment's preference signals. Check out the thread below!

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

TuringPost

@theturingpost

7 months ago

Natural Language Reinforcement Learning (NLRL) redefines Reinforcement Learning (RL). The main idea: In NLRL, the core parts of RL like goals, strategies, and evaluation methods are reimagined using natural language instead of rigid math. What are the benefits? - NLRL uses not

thumb_up_off_alt769

chat_bubble_outline11

repeat144

shareShare

Calvin Luo

@calvinyluo

7 months ago

Frozen diffusion models pretrained on web images and videos can teach embodied agents to accomplish novel behaviors and goals in diverse physical environments! We present TADPoLe🐸, a way to “distill” large-scale pretraining into text-conditioned policy learning! #NeurIPS2024

thumb_up_off_alt57

chat_bubble_outline4

repeat17

shareShare

Zilai Zeng

@zilaizeng

7 months ago

Want to know how web-scale pretrained diffusion models can provide text-conditioned reward signals for learning novel behaviors across different robotic configurations and environments? Check our poster at #NeurIPS2024 on Thursday, December 12th at 4:30 pm (Poster #6706)!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Xidong Feng

@xidong_feng

6 months ago

NLRL is now open-sourced at github.com/waterhorse1/Na…! Check it and welcome to try NLRL in other LLM agents, reasoning, and coding tasks!

thumb_up_off_alt14

chat_bubble_outline0

repeat6

shareShare

TuringPost

@theturingpost

6 months ago

NLRL, or Natural Language Reinforcement Learning, is about adapting RL methods to work in the natural language field. Traditional RL aims to learn a policy (strategy) guiding the agent to the best action in each state. Instead of this, NLRL integrates a Chain-of-Thought

thumb_up_off_alt487

chat_bubble_outline5

repeat85

shareShare

DeepSeek

@deepseek_ai

5 months ago

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

thumb_up_off_alt37,37K

chat_bubble_outline2,2K

repeat7,7K

shareShare

RAI Institute

@rai_inst

4 months ago

In this demo, Ultra Mobile Vehicle (UMV) drives, turns, jumps, tricks, and comes to a sudden stop called a track-stand. All of the driving, landings, balance, and track-stands are done using reinforcement learning.

thumb_up_off_alt764

chat_bubble_outline27

repeat165

shareShare

Paul Zhou

@zhiyuan_zhou_

3 months ago

Can we make robot policy evaluation easier and less time consuming? Introducing AutoEval, a system that *autonomously* evaluates generalist policies 24/7 and closely matches human results. We make 4 tasks 💫publicly available💫 Submit your policy at auto-eval.github.io! 🧵👇

thumb_up_off_alt159

chat_bubble_outline3

repeat21

shareShare

Zilai Zeng

@zilaizeng

2 months ago

Internet-scale videos capture rich knowledge about motions and behaviors, but how can we effectively adapt them for domain-specific robotic tasks and further unlock behavior generalization? Come to our poster (#418) at #ICLR2025 and chat with Calvin Luo on April 25th at 3:30pm!

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Omar Khattab

@lateinteraction

a month ago

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long

thumb_up_off_alt573

chat_bubble_outline18

repeat79

shareShare