Noetic (@noetic_labs) 's Twitter Profile
Noetic

@noetic_labs

The experiential learning company

ID: 1925778757492625408

linkhttps://www.noeticlabs.co calendar_today23-05-2025 05:00:28

17 Tweet

99 Followers

0 Following

Lienid (@0xlienid) 's Twitter Profile Photo

We built a way for models to learn from arbitrary experience to move past naive SFT or scalar rewards The results: - +17% on holdout HumanEval set from just seeing printouts - +33% on GSM8K training set from natural language feedback No labels. No reward. Just experience.

We built a way for models to learn from arbitrary experience to move past naive SFT or scalar rewards

The results:
- +17% on holdout HumanEval set from just seeing printouts
- +33% on GSM8K training set from natural language feedback

No labels. No reward. Just experience.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxestex) 's Twitter Profile Photo

Noetic develops «experiential learning framework», with the goal to move beyond RLVR or VR-CLI and their dependence on gold standard answers, make proper use of rich feedback for the general case, and usher in Richard Sutton's Era of Experience.

Noetic develops «experiential learning framework», with the goal to move beyond RLVR or VR-CLI and their dependence on gold standard answers, make proper use of rich feedback for the general case, and usher in <a href="/RichardSSutton/">Richard Sutton</a>'s Era of Experience.
Lienid (@0xlienid) 's Twitter Profile Photo

Update: We've now matched GRPO performance. Again, no scalar rewards, no trajectory filtering for SFT. Now +36% over GSM8K baseline with Experiential Learning

Update:

We've now matched GRPO performance. Again, no scalar rewards, no trajectory filtering for SFT.

Now +36% over GSM8K baseline with Experiential Learning
Lienid (@0xlienid) 's Twitter Profile Photo

I very very strongly agree. It is incredibly inefficient, and not at all how intelligent creatures learn. Intelligence scales with converting unstructured experience into knowledge and behaviors. noeticlabs.co/el

Noetic (@noetic_labs) 's Twitter Profile Photo

RL is not good for continual learning because continual learning is not only about forgetting Continual learning in any meaningful way requires: - Learning from outcomes - In an unstructured output space and environment - With minimal forgetting RL is bad at the second