Violet X. (@ziyux) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Language models struggle to search, not due to an architecture problem, but a data one! They rarely see how to search or backtrack. We show how LLMs can be taught to search by representing the process of search in language as a flattened string, a stream of search (SoS)!

thumb_up_off_alt591

chat_bubble_outline8

repeat108

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a year ago

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

thumb_up_off_alt945

chat_bubble_outline16

repeat156

shareShare

Philipp

@jphilipp95

a year ago

Constitutional AI showed LMs can learn to follow constitutions by labeling their own outputs. But why can't we just tell a base model the principles of desired behavior and rely on it to act appropriately? Introducing SAMI: Self-Supervised Alignment with Mutual Information!

thumb_up_off_alt154

chat_bubble_outline3

repeat35

shareShare

Violet X.

@ziyux

a year ago

Excited about our new paper - Hypothetical Minds! The hypothesis-search-based approach shows a lot of promise in adapting to diverse agents in multi-agent settings. Check out the full paper for more!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Fan-Yun Sun

@sunfanyun

10 months ago

Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we “distill” embodied policies from foundational models? Introducing FactorSim! #NeurIPS2024 We show that by generating prompt-aligned simulations and

thumb_up_off_alt212

chat_bubble_outline2

repeat45

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

7 months ago

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat230

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

7 months ago

"Superintelligence isn't about discovering new things; it's about discovering new ways to discover" -> Meta RL

thumb_up_off_alt118

chat_bubble_outline2

repeat32

shareShare

SynthLabs

@synth_labs

7 months ago

Ever watched someone solve a hard math problem? Their first attempt is rarely perfect. They sketch ideas, cross things out, and try new angles. This process of exploration is key to human reasoning and our latest research formalizes this as Meta Chain-of-Thought (1/8) 🧵👇

thumb_up_off_alt225

chat_bubble_outline7

repeat42

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

7 months ago

Scaling inference-time interaction

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Jiayi Pan

@jiayi_pirate

7 months ago

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵

thumb_up_off_alt6,6K

chat_bubble_outline195

repeat1,1K

shareShare

Andrew Ng

@andrewyng

6 months ago

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

thumb_up_off_alt4,4K

chat_bubble_outline199

repeat731

shareShare

SynthLabs

@synth_labs

6 months ago

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

thumb_up_off_alt142

chat_bubble_outline3

repeat16

shareShare

Anikait Singh

@anikait_singh_

5 months ago

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵

thumb_up_off_alt133

chat_bubble_outline1

repeat11

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

5 months ago

This is the dataset we curated for our own reasoning experiments. There is a lot of reasoning data coming out now, but we spend extra time on this to make sure all the problems are high-quality and suitable for RL training!

thumb_up_off_alt52

chat_bubble_outline2

repeat10

shareShare

Percy Liang

@percyliang

5 months ago

1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!

thumb_up_off_alt339

chat_bubble_outline5

repeat68

shareShare

Kanishk Gandhi

@gandhikanishk

5 months ago

New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵1/13

thumb_up_off_alt945

chat_bubble_outline22

repeat183

shareShare

Rishabh Agarwal

@agarwl_

3 months ago

Going beyond verifiable domains, we still need reward models, which will likely be generative verifiers! Recent papers along this direction: 1. Scaling RL with RMs on "synthetic" prompts @ ICML25 2. Step by Step Verifiers That Think -- Better perf than PRM800K with 1K labels

thumb_up_off_alt260

chat_bubble_outline16

repeat30

shareShare

Violet X.

@ziyux

2 months ago

Check out this work on benchmarking how well LLMs can implement ML research papers into code led by Tianyu Hua !

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

SynthLabs

@synth_labs

2 months ago

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

thumb_up_off_alt33

chat_bubble_outline2

repeat8

shareShare

Rylan Schaeffer

@rylanschaeffer

a month ago

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

thumb_up_off_alt107

chat_bubble_outline4

repeat20

shareShare