Scott Niekum (@scottniekum) Twitter Tweets • TwiCopy

Greg Durrett

a year ago

This project started with us annoyed at papers evaluating CoT "reasoning" with only GSM8k & MATH. We didn't expect to find such strong evidence that these are the only type of problem where CoT helps! Credit to Juan Diego Rodríguez (he/him) & Kyle Mahowald for driving the rigorous meta-analysis!

thumb_up_off_alt164

chat_bubble_outline6

repeat32

shareShare

Harshit Sikchi

@harshit_sikchi

a year ago

Our cross-university(s) collaborative work on "Scaling laws for Reward Model Overoptimization in Direct Alignment Algorithms" is accepted at NeurIPS Conference!

thumb_up_off_alt21

chat_bubble_outline0

repeat5

shareShare

David Krueger

@davidskrueger

a year ago

"Predicting Future Actions of Reinforcement Learning Agents" - Chung et al. We introduce the problem of predicting RL agents' behavior, which could have important safety implications. We find that RL agents that perform explicit (or implicit) planning can be more predictable.

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Marlos C. Machado

@marloscmachado

a year ago

For those interested, the keynotes of the RL_Conference 2024 are now available online: youtube.com/@RL-conference… Unfortunately, Doina Precup's talk was not recorded, but we have: Andy Barto, Emma Brunskill, Finale Doshi-Velez, Sergey Levine, David Silver, and Peter Stone.

thumb_up_off_alt251

chat_bubble_outline1

repeat63

shareShare

Eugene Vinitsky 🍒🦋

@eugenevinitsky

a year ago

In our new paper, we find that LLMs can efficiently do RLHF in-context! Our method, in-context preference learning (ICPL), iterates LLMs writing reward functions, training agents, and putting preferences into context. We see a 30x boost in query efficiency over baseline RLHF!

thumb_up_off_alt225

chat_bubble_outline2

repeat25

shareShare

Zizhao Wang

@duke_zzwang

a year ago

In multi-object env, why do most Unsupervised Skill Discovery methods fail to learn complex skills like tool use? Because they simply maximize state coverage. Introducing our solution SkiLD: Skill Discovery Guided by Factor Interactions (NeurIPS24) wangzizhao.github.io/SkiLD/

thumb_up_off_alt64

chat_bubble_outline1

repeat12

shareShare

Meghan E. Huber

@meghanehuber

a year ago

Come join our team at UMass Robotics!! We are hiring at the Associate/Full level for a joint appointment in engineering and computer science. Feel free to reach out if you have any questions. RTs appreciated :) careers.umass.edu/amherst/en-us/…

thumb_up_off_alt20

chat_bubble_outline0

repeat17

shareShare

RLDM

@rldmdublin2025

a year ago

Save the date! RLDM 2025, The Multi-disciplinary Conference on Reinforcement Learning and Decision Making, is only around the corner. Visit our website to keep an eye on our submission deadlines👀 rldm.org

thumb_up_off_alt29

chat_bubble_outline0

repeat7

shareShare

brendan o'connor

@brendan642

a year ago

We're hiring new #nlproc faculty this year! Asst or Assoc Professors in NLP at UMass CICS -- careers.umass.edu/amherst/en-us/…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

RL_Conference

@rl_conference

a year ago

The call for papers for RLC is now up! Abstract deadline of 2/14, submission deadline of 2/21! Please help us spread the word. rl-conference.cc/callforpapers.…

thumb_up_off_alt90

chat_bubble_outline0

repeat29

shareShare

Scott Niekum

@scottniekum

a year ago

I'm quite excited about this and still a bit shocked that it works as well as it does. Imitation via distribution matching has always felt like a clunky, brittle way to teach agents. Language + zero-shot RL is natural and scales well, due to the unsupervised nature of RL Zero.

thumb_up_off_alt29

chat_bubble_outline0

repeat3

shareShare

Greg Durrett

@gregd_nlp

a year ago

Huge congrats to Prasann Singhal for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...) Prasann's work... 🧵

Huge congrats to <a href="/prasann_singhal/">Prasann Singhal</a> for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...)

Prasann's work... 🧵

thumb_up_off_alt99

chat_bubble_outline3

repeat12

shareShare

Gokul Swamy

@g_k_swamy

9 months ago

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat231

shareShare

Scott Niekum

@scottniekum

9 months ago

It’s about time! 🎉🎉🎉🎉🎉🎉🎉 nytimes.com/2025/03/05/tec…

thumb_up_off_alt16

chat_bubble_outline0

repeat2

shareShare

Scott Niekum

@scottniekum

8 months ago

I'm extremely proud of the work that Harshit has done and looking forward to seeing what he does next. Congratulations, Harshit!

thumb_up_off_alt34

chat_bubble_outline1

repeat0

shareShare

RL_Conference

@rl_conference

6 months ago

Reminder that early registration for RLC closes on the 30th! Please register early to save yourself some money and help us get the word out.

thumb_up_off_alt18

chat_bubble_outline1

repeat3

shareShare

Harshit Sikchi

@harshit_sikchi

6 months ago

Behavioral Foundation Models (BFMs) trained with RL are secretly more powerful than we think. BFM’s directly output a policy believed to be near-optimal given any reward function. Our new work shows that they can actually do much better:

thumb_up_off_alt344

chat_bubble_outline2

repeat44

shareShare

Harshit Sikchi

@harshit_sikchi

3 months ago

RLZero will be presented at NeurIPS Conference 2025 . Learn more about the work in the thread below:

thumb_up_off_alt55

chat_bubble_outline4

repeat7

shareShare