Violet X. (@ziyux) 's Twitter Profile
Violet X.

@ziyux

PhD student @Stanford. Working on LLM-based agents

ID: 394052579

linkhttps://violetxi.github.io calendar_today19-10-2011 14:09:15

77 Tweet

161 Takipçi

323 Takip Edilen

Kanishk Gandhi (@gandhikanishk) 's Twitter Profile Photo

Language models struggle to search, not due to an architecture problem, but a data one! They rarely see how to search or backtrack. We show how LLMs can be taught to search by representing the process of search in language as a flattened string, a stream of search (SoS)!

Rafael Rafailov @ NeurIPS (@rm_rafailov) 's Twitter Profile Photo

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

We have a new preprint out - your language model is not a reward, it’s a Q function!
1. The likelihood of the preferred answer must go down - it’s a policy divergence
2. MCTS guided decoding on language is equivalent to likelihood search on DPO
3. DPO learns credit assignment
Philipp (@jphilipp95) 's Twitter Profile Photo

Constitutional AI showed LMs can learn to follow constitutions by labeling their own outputs. But why can't we just tell a base model the principles of desired behavior and rely on it to act appropriately? Introducing SAMI: Self-Supervised Alignment with Mutual Information!

Violet X. (@ziyux) 's Twitter Profile Photo

Excited about our new paper - Hypothetical Minds! The hypothesis-search-based approach shows a lot of promise in adapting to diverse agents in multi-agent settings. Check out the full paper for more!

Fan-Yun Sun (@sunfanyun) 's Twitter Profile Photo

Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we “distill” embodied policies from foundational models? Introducing FactorSim! #NeurIPS2024 We show that by generating prompt-aligned simulations and

Rafael Rafailov @ NeurIPS (@rm_rafailov) 's Twitter Profile Photo

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.
SynthLabs (@synth_labs) 's Twitter Profile Photo

Ever watched someone solve a hard math problem? Their first attempt is rarely perfect. They sketch ideas, cross things out, and try new angles. This process of exploration is key to human reasoning and our latest research formalizes this as Meta Chain-of-Thought (1/8) 🧵👇

Jiayi Pan (@jiayi_pirate) 's Twitter Profile Photo

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works 

Through RL, the 3B base LM develops self-verification and search abilities all on its own 

You can experience the Ahah moment yourself for &lt; $30 
Code: github.com/Jiayi-Pan/Tiny…

Here's what we learned 🧵
Andrew Ng (@andrewyng) 's Twitter Profile Photo

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

SynthLabs (@synth_labs) 's Twitter Profile Photo

Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑‍⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇

Releasing Big-MATH—the first heavily curated &amp; verifiable dataset designed specifically for large-scale RL training &amp; LLM reasoning!

📝 250,000+ problems, 47k NEW Q's
✅ 10x larger than existing datasets like MATH
🧑‍⚖️ Verifiable—we eliminated 400k+ problems

Details below! 🧵👇
Anikait Singh (@anikait_singh_) 's Twitter Profile Photo

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵
Rafael Rafailov @ NeurIPS (@rm_rafailov) 's Twitter Profile Photo

This is the dataset we curated for our own reasoning experiments. There is a lot of reasoning data coming out now, but we spend extra time on this to make sure all the problems are high-quality and suitable for RL training!

Percy Liang (@percyliang) 's Twitter Profile Photo

1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!

1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams.  #AIinHealthcare
Blog, GitHub, and link to leaderboard in thread!
Kanishk Gandhi (@gandhikanishk) 's Twitter Profile Photo

New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵1/13

Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Going beyond verifiable domains, we still need reward models, which will likely be generative verifiers! Recent papers along this direction: 1. Scaling RL with RMs on "synthetic" prompts @ ICML25 2. Step by Step Verifiers That Think -- Better perf than PRM800K with 1K labels

Going beyond verifiable domains, we still need reward models, which will likely be generative verifiers! Recent papers along this direction: 

1. Scaling RL with RMs on "synthetic" prompts @ ICML25

2. Step by Step Verifiers That Think -- Better perf than PRM800K with 1K labels
SynthLabs (@synth_labs) 's Twitter Profile Photo

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.

Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50%

🧵👇1/10
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

<a href="/JoshuaK92829/">Joshua Kazdan</a> <a href="/ApratimDey2/">Apratim Dey</a> <a href="/MGerstgrasser/">Matthias Gerstgrasser</a> <a href="/rm_rafailov/">Rafael Rafailov @ NeurIPS</a> <a href="/sanmikoyejo/">Sanmi Koyejo</a> 

1/7