Arnav Garg (@grg_arnav) Twitter Tweets • TwiCopy

Predibase

10 months ago

🚀 #RFT vs. #SFT: When to Use Each for Maximum Impact #DeepSeek -R1 made #Reinforcement #FineTuning (RFT) the hot new thing—but is it better than #Supervised Fine-Tuning (SFT)? 🤔 Here’s when RFT wins: ✅ No labeled data? If you can verify correctness, RFT works. ✅ <100

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Travis Addair

@travisaddair

9 months ago

Train your own DeepSeek-R1 with GRPO in Predibase: the first fully-managed serverless Reinforcement Fine-Tuning platform.

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Predibase

@predibase

9 months ago

Today we're thrilled to announce the first end-to-end platform for Reinforcement Fine-Tuning. With just a dozen labeled data points, you can outperform #OpenAI o1 and #DeepSeekR1 on complex tasks. Built on the #GRPO methodology that DeepSeek-R1 popularized, our platform delivers

thumb_up_off_alt513

chat_bubble_outline20

repeat73

shareShare

Arnav Garg

@grg_arnav

9 months ago

🚀 Launching Reinforcement Fine-Tuning (RFT) at Predibase - the first platform to fine-tune LLMs with just a few prompts & reward functions. No massive datasets needed, just GPT-4o & GPT-o1 beating performance made simple.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Saam Motamedi

@saammotamedi

9 months ago

Huge release from @Predibase today -- the first end-to-end platform for Reinforcement Fine-Tuning Bringing the techniques that power DeepSeekR1 to any open source model and data

thumb_up_off_alt24

chat_bubble_outline2

repeat8

shareShare

Piero Molino

@w4nderlus7

9 months ago

Fine-tuning is great for adapting LLMs to specific tasks, but what if you don’t have much data? Starting today, you can use the world’s first end-to-end Reinforcement Fine-Tuning (RFT) Platform within Predibase and train models with zero data! We’ve enhanced GRPO, the

thumb_up_off_alt28

chat_bubble_outline5

repeat7

shareShare

Sebastian Raschka

@rasbt

8 months ago

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat191

shareShare

Predibase

@predibase

8 months ago

🐳 AI teams are testing DeepSeek—but nobody agrees on when to use it In our recent survey of 500+ AI professionals, DeepSeek-R1 is getting serious attention—but it's far from mainstream. Here’s what we uncovered: 📊 57% of teams have experimented with DeepSeek-R1 ⚠️ Only 3%

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Avi Chawla

@_avichawla

8 months ago

Supervised & Reinforcement fine-tuning in LLMs, clearly explained (with visuals):

thumb_up_off_alt549

chat_bubble_outline5

repeat65

shareShare

Predibase

@predibase

7 months ago

🚀 Serve and fine-tune #Qwen3 — in your cloud or ours with blazing fast #inference speeds! No need to share your data. 🚀 Qwen 3 is the latest #opensource LLM dominating the leaderboards. Don't get left behind! Now you can serve and customize the latest Qwen models instantly

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Andrew Ng

@andrewyng

7 months ago

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with @Predibase, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat174

shareShare

Travis Addair

@travisaddair

7 months ago

It was an honor getting to work together with the DeepLearning.ai team and my colleague Arnav Garg on this course covering all things Reinforcement Fine-Tuning and GRPO. Similar to our last course on efficient LLM inference, we wanted to really drill into the intuition

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Arnav Garg

@grg_arnav

7 months ago

I had a blast working with the DeepLearning.AI team and my colleague Travis Addair over the last few months to put this course together on Reinforcement Fine-Tuning with GRPO! We’ve tried to make this course as practical as possible and help you build intuition. Hope you enjoy!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Predibase

@predibase

6 months ago

🚀 Fresh off our hit DeepLearning.AI course on RFT + #GRPO, we’re going live! 🎙️ Let’s Talk Tokens: Live #AMA on Reinforcement Fine-Tuning with the Experts Who Built the Definitive Course! #RFT isn’t just research any more—it’s driving real-world GenAI with tighter feedback

🚀 Fresh off our hit <a href="/DeepLearningAI/">DeepLearning.AI</a> course on RFT + #GRPO, we’re going live!

🎙️ Let’s Talk Tokens: Live #AMA on Reinforcement Fine-Tuning with the Experts Who Built the Definitive Course!

#RFT isn’t just research any more—it’s driving real-world GenAI with tighter feedback

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Geoffrey Angus

@geoffreyangus

6 months ago

Struggling with context management? Wish you could just stick it all in your model? We’ve integrated Cartridges, a new method of leveraging sleep-time compute for learning long contexts, into Tokasaurus, an inference engine optimized for high-throughput 🧵

thumb_up_off_alt39

chat_bubble_outline1

repeat10

shareShare

Predibase

@predibase

5 months ago

Big news! We will be joining @RubrikInc to accelerate agentic AI adoption from pilot to production at scale! ⚡️ Together, we can deliver radical simplicity in models and data. This is an exciting next step in our journey. More from Dev Rishi here: pbase.ai/45yUL2O

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Predibase

@predibase

5 months ago

🧠 Join the 10k developers supercharging their #LLM skills with Reinforcement Fine-tuning—and it's free! 🧠 Reinforcement Fine-Tuning (#RFT) and #GRPO are fast becoming popular techniques to teach LLMs how to reason. We teamed up with DeepLearning.AI to build the definitive

thumb_up_off_alt38

chat_bubble_outline0

repeat9

shareShare