Paria Rashidinejad (@paria_rd) Twitter Tweets • TwiCopy

Paria Rashidinejad

@paria_rd

+ Follow

Incoming Assistant Professor of ECE @USC | Research Scientist at FAIR, GenAI @AIatMeta | PhD @Berkeley_EECS @berkeley_ai @CHAI_Berkeley

ID: 1634327660007669760

calendar_today10-03-2023 22:57:55

10 Tweet

104 Followers

191 Following

Max Simchowitz

@max_simchowitz

8 months ago

Hey Everyone!! I will be giving a lecture at the RL Theory Virtual Seminar tomorrow, on my new paper about the “Pitfalls of Imitation Learning" in continuous action spaces. 🧵 below; please read because the time is somewhat TBD..... 🧐

thumb_up_off_alt90

chat_bubble_outline4

repeat9

shareShare

Simon Shaolei Du

@simonshaoleidu

8 months ago

Excited to share our work led by Yiping Wang RLVR with only ONE training example can boost 37% accuracy on MATH500.

thumb_up_off_alt49

chat_bubble_outline2

repeat5

shareShare

Reyhane Askari

@reyhaneaskari

8 months ago

Deliberate practice is accepted to #ICML2025 as a spotlight (top 2.6%!) 🚀

thumb_up_off_alt145

chat_bubble_outline1

repeat17

shareShare

Jason Lee

@jasondeanlee

7 months ago

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

thumb_up_off_alt149

chat_bubble_outline1

repeat15

shareShare

Nived Rajaraman

@nived_rajaraman

7 months ago

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

thumb_up_off_alt74

chat_bubble_outline1

repeat24

shareShare

Yuandong Tian

@tydsh

7 months ago

📢 Our travel planner solver (arxiv.org/abs/2410.16456, published in EMNLP Demo Track'24, and arxiv.org/abs/2411.13904) is now open sourced in github.com/facebookresear… 🚀🚀 In these works, we build LLM-equipped agent that can take user inputs in natural language, in either the

thumb_up_off_alt34

chat_bubble_outline1

repeat7

shareShare

Simon Shaolei Du

@simonshaoleidu

7 months ago

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

thumb_up_off_alt97

chat_bubble_outline0

repeat18

shareShare

Xian Li

@xl_nlp

7 months ago

Scaling RL is 🔥 what's the fuel? We propose a new paradigm to generate "synthetic environments", unlimited tasks with verifiable rewards for self-improvement. Welcome to the era of experience 🙌

thumb_up_off_alt130

chat_bubble_outline5

repeat21

shareShare

Mehrdad Farajtabar

@mfarajtabar

6 months ago

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,

thumb_up_off_alt2,2K

chat_bubble_outline101

repeat532

shareShare

Amrith Setlur

@setlur_amrith

6 months ago

Introducing e3 🔥 Best <2B model on math 💪 Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️ 🚨 arxiv.org/abs/2506.09026 🚨 matthewyryang.github.io/e3/

thumb_up_off_alt86

chat_bubble_outline1

repeat20

shareShare

Jiawei Zhao

@jiawzhao

6 months ago

You can skip prompts that aren’t useful for the current policy during training! 🔍 Efficient prompt selection is key to scaling RL training for LLM reasoning. We are actively building algos for efficient and scalable RL training system. Stay tuned!

thumb_up_off_alt15

chat_bubble_outline1

repeat3

shareShare

Xian Li

@xl_nlp

6 months ago

The future of pretraining is synthetically organic ♻️♻️🌱🌱 We show that grounded synthetic data outperforms high-quality human-written web data such as curated DCLM.

thumb_up_off_alt52

chat_bubble_outline2

repeat7

shareShare

Csaba Szepesvari

@csabaszepesvari

5 months ago

First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?

thumb_up_off_alt431

chat_bubble_outline19

repeat67

shareShare

Brandon Amos

@brandondamos

5 months ago

Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code 🚀 algotune.io 📚 algotune.io/paper.pdf 🤖 github.com/oripress/AlgoT… with Ofir Press Ori Press Patrick Kidger Bartolomeo Stellato Arman Zharmagambetov & many others 🧵

thumb_up_off_alt129

chat_bubble_outline2

repeat26

shareShare