Ziran Yang (@__zrrr__) 's Twitter Profile
Ziran Yang

@__zrrr__

Incoming PhD student @Princeton, BS@Peking Univ.

ID: 1553685423096221697

linkhttps://ziranyang0.github.io/ calendar_today31-07-2022 10:14:21

7 Tweet

30 Takipçi

435 Takip Edilen

Yong Lin (@yong18850571) 's Twitter Profile Photo

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

(1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B > 671B: Our 8B
Yong Lin (@yong18850571) 's Twitter Profile Photo

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems  —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using:  
* 1/20 the model size (32B vs. 671B)  
* 1/5 the passes (184 vs. 1024)  
Meanwhile, we also release  
*
Chi Jin (@chijinml) 's Twitter Profile Photo

🚀With early access to Tinker, we matched full-parameter SFT performance as in Goedel-Prover V2 (32B) (on the same 20% data) using LoRA + 20% of the data. 📊MiniF2F Pass@32 ≈ 81 (20% SFT). Next: full-scale training + RL. This is something that previously took a lot more effort

Ziran Yang (@__zrrr__) 's Twitter Profile Photo

We released Goedel-Prover-V2, a state-of-the-art model for formal theorem proving at launch. Remarkably, it has remained at the top of the open-source formal theorem proving leaderboard for over six months. We have been excited to see so many folks cooking with our models.