Rui Yang (@ruiyang70669025) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Rui Yang

@ruiyang70669025

10 months ago

DPO is demonstrated to be a promising reward model in the new benchmark, while fine-tuning sequence classifiers can be specialized with smaller model size for similar performance.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Wanna train a SOTA reward model? 🌟New Blog Alert: "Reward Modeling for RLHF" (with Wei Xiong & Rui Yang) is live this weekend! 🌐✨ We delve into the insights behind achieving groundbreaking performance on the RewardBench (by Nathan Lambert). efficient-unicorn-451.notion.site/Reward-Modelin…

thumb_up_off_alt56

chat_bubble_outline3

repeat15

shareShare

Rui

@rui4research

10 months ago

Excited to share LISA, which enables - 7B tuning on a 24GB GPU - 70B tuning on 4x80GB GPUs and obtains better performance than LoRA in ~50% less time 🚀

thumb_up_off_alt549

chat_bubble_outline8

repeat114

shareShare

Rafael Rafailov

@rm_rafailov

9 months ago

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

thumb_up_off_alt958

chat_bubble_outline15

repeat155

shareShare

fly51fly

@fly51fly

9 months ago

[LG] DPO Meets PPO: Reinforced Token Optimization for RLHF arxiv.org/abs/2404.18922 - This paper models RLHF as an MDP, offering a token-wise characterization of LLM's generation process. It theoretically demonstrates advantages of token-wise MDP over sentence-wise bandit

thumb_up_off_alt107

chat_bubble_outline0

repeat37

shareShare

Haoran Xu

@ryanxhr

9 months ago

I will attend #ICLR2024 next week, hoping to meet old and new friends in Vienna!🇦🇹 I will present "ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update" ✨spotlight✨ A simple modification (<20 line code) to DICE that makes it work!

thumb_up_off_alt40

chat_bubble_outline4

repeat2

shareShare

Rui Yang

@ruiyang70669025

9 months ago

I will present our #ICLR2024 spotlight paper Robust IQL next week in Vienna! Looking forward to discussing RL and RL for LLMs!

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

Stefano Albrecht (UoE Agents Group)

@uoe_agents

8 months ago

[1/6] After fantastic visits recently to London and Edmonton, Canada, I am excited to announce my next stop is China! 🌏 From June 10-21 I will present to universities, businesses, and UK embassy in Beijing, Shanghai, Shenzhen, and Hong Kong. See thread for schedule + details⬇️

thumb_up_off_alt18

chat_bubble_outline6

repeat7

shareShare

Shizhe Diao

@shizhediao

7 months ago

🥰Happy to share LMFlow got accepted to #NAACL2024 demo track! arxiv.org/abs/2306.12420 now hosts camera-ready hot takes on: -One-stop lightweight toolkit for LLM fine-tuning -Support SOTA techniques like LISA -Streamlining scientific LLM development like AstroLLaMA-Chat,MarineGPT

thumb_up_off_alt15

chat_bubble_outline1

repeat5

shareShare

Seohong Park

@seohong_park

7 months ago

This excellent lecture from Nan Jiang's RL theory class is really informative! mediaspace.illinois.edu/media/t/1_pb42… It covers Bellman completeness, the "double-sampling" issue with the Bellman operator, and "virtual" stochasticity caused by a limited function class.

This excellent lecture from <a href="/nanjiang_cs/">Nan Jiang</a>'s RL theory class is really informative! mediaspace.illinois.edu/media/t/1_pb42…

It covers Bellman completeness, the "double-sampling" issue with the Bellman operator, and "virtual" stochasticity caused by a limited function class.

thumb_up_off_alt136

chat_bubble_outline0

repeat22

shareShare

Zhihui Xie

@_zhihuixie

7 months ago

Why aligned LLMs are so vulnerable to adversarial attacks? Our work attributes this vulnerability to reward misspecification during the alignment process. By exploiting this loophole, we find fundamentally misaligned prompts, leading to more effective automated red teaming. 🧵

thumb_up_off_alt32

chat_bubble_outline1

repeat11

shareShare

renjie pi

@renjiepi

7 months ago

🔥Introducing Image Textualization (IT), an automatic framework for generating detailed and accurate image descriptions. We release 220K high-quality image descriptions using IT. ⭐️Paper: arxiv.org/pdf/2406.07502 ⭐️Code: github.com/sterzhang/imag… ⭐️Data: huggingface.co/datasets/Sterz…

thumb_up_off_alt110

chat_bubble_outline3

repeat35

shareShare

Rui Yang

Gate.io

Rui Yang

Hanze Dong

Rui

Rafael Rafailov

fly51fly

Haoran Xu

Rui Yang

Stefano Albrecht (UoE Agents Group)

Shizhe Diao

Seohong Park

Zhihui Xie

renjie pi