Noam Razin (@noamrazin) 's Twitter Profile
Noam Razin

@noamrazin

Postdoctoral Fellow at @PrincetonPLI | Past: Computer Science PhD @TelAvivUni & Apple Scholar in AI/ML | Interested in the foundations of deep learning

ID: 1261252348669767680

linkhttps://noamrazin.github.io/ calendar_today15-05-2020 11:09:34

126 Tweet

543 Takipçi

276 Takip Edilen

Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Yoni Slutzky (@yonislutzky) 's Twitter Profile Photo

Do neural nets really need gradient descent to generalize?🚨 We dive into matrix factorization and find a sharp split: wide nets rely on GD, while deep nets can thrive with any low-training-error weights! arxiv.org/abs/2506.03931 🧵

Do neural nets really need gradient descent to generalize?🚨

We dive into matrix factorization and find a sharp split: wide nets rely on GD, while deep nets can thrive with any low-training-error weights!

arxiv.org/abs/2506.03931

🧵
Yong Lin (@yong18850571) 's Twitter Profile Photo

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

(1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B > 671B: Our 8B
Pierfrancesco Beneventano (@pierbeneventano) 's Twitter Profile Photo

New extended version of the preprint “Edge of Stochastic Stability (EoSS)” out! w/ Arseniy Andreyev 👉 arxiv.org/pdf/2412.20553 🗓️ Tomorrow (Wed July 23, 12 PM EDT) I’ll talk about it at OWML — sfu.zoom.us/j/89334355925 I've never explained what that was about I'll do it here:

Kilian Lieret @ICLR (@klieret) 's Twitter Profile Photo

Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for benchmarking, fine-tuning, RL, or just for use from your terminal. It’s open source, simple to hack, and compatible with any LM! Link in 🧵

Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified!
Made for benchmarking, fine-tuning, RL, or just for use from your terminal.
It’s open source, simple to hack, and compatible with any LM! Link in 🧵
Yong Lin (@yong18850571) 's Twitter Profile Photo

The report of Goedel-Prover-V2 is on arXiv now arxiv.org/pdf/2508.03613 . Check out the details on self-correction, large scale scaffolded data sythesis framework, and the magical model averaging.

The report of Goedel-Prover-V2 is on arXiv now arxiv.org/pdf/2508.03613 . Check out the details on self-correction, large scale scaffolded data sythesis framework,  and the magical model averaging.